Fibonacci Number
We study the classic problem, Fibonacci Number, to demonstrate the use of recursive task parallelism.
Problem Formulation
In mathematics, the Fibonacci numbers, commonly denoted as fibonacci(n), form a sequence such that each number is the sum of the two preceding ones, starting from 0 and 1.
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
A common solution for computing fibonacci numbers is recursion:
int fibonacci(int n) { if(n < 2) return n; return fibonacci(n-1) + fibonacci(n-2); }
Recursive Fibonacci Parallelism using Runtime Tasking
We use Runtime Tasking and Asynchronous Tasking to recursively compute Fibonacci numbers in parallel. A runtime task tasks a reference to tf::
#include <taskflow/taskflow.hpp> size_t fibonacci(size_t N, tf::Runtime& rt) { if(N < 2) return N; size_t res1, res2; rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); }); rt.silent_async([N, &res2](tf::Runtime& rt2){ res2 = fibonacci(N-2, rt2); }); // cooperatively run tasks until all tasks spawned by `rt` complete rt.corun(); return res1 + res2; } int main() { tf::Executor executor; size_t N = 30, res; executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); }); executor.wait_for_all(); std::cout << N << "-th Fibonacci number is " << res << '\n'; return 0; }
The fibonacci function recursively spawns two asynchronous tasks to compute fibonacci(N-1) and fibonacci(N-2) in parallel using tf::. After spawning the two tasks, the function invokes tf::rt complete, without blocking the caller worker. In the main function, the executor creates an async task from the top Fibonacci number and waits for completion using tf::
Tail Recursion Optimization
In recursive parallelism, especially for problems like Fibonacci computation, spawning both recursive branches as asynchronous tasks can lead to excessive task creation and stack growth, which may degrade performance and overwhelm the runtime scheduler. Additionally, when both child tasks are launched asynchronously, the parent task must wait for both to finish, potentially blocking a worker thread and reducing parallel throughput. To address these issues, we apply tail recursion optimization to one branch of the Fibonacci call. This allows one of the recursive calls to proceed immediately in the current execution context, reducing both scheduling overhead and stack usage.
size_t fibonacci(size_t N, tf::Runtime& rt) { if(N < 2) return N; size_t res1, res2; rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); }); // tail optimization for the right child res2 = fibonacci(N-2, rt); // cooperatively run tasks until all tasks spawned by `rt` complete rt.corun(); return res1 + res2; }
The figure below shows the execution diagram, where the suffix *_1 represent the left child spawned by its parent runtime. As we can see, the right child is optimized out through tail recursion optimization.
Benchmarking
Based on the discussion above, we compare the runtime of recursive Fibonacci parallelism (1) with tail recursion optimization and (2) without it, across different Fibonacci numbers.
| N | w/ tail recursion optimization | w/o tail recursion optimization |
|---|---|---|
| 20 | 0.23 ms | 0.31 ms |
| 25 | 2 ms | 4 ms |
| 30 | 23 ms | 42 ms |
| 35 | 269 ms | 483 ms |
| 40 | 3003 ms | 5124 ms |
As N increases, the performance gap between the two versions widens significantly. With tail recursion optimization, the program avoids spawning another async task, thereby reducing scheduling overhead and stack pressure. This leads to better CPU utilization and lower task management cost. For example, at N = 40, tail recursion optimization reduces the runtime by over 40%.
Recursive Fibonacci Parallelism using %Task Group
Similar to tf::
tf::Executor executor; size_t fibonacci(size_t N) { if(N < 2) return N; size_t res1, res2; tf::TaskGroup tg = executor.task_group(); tg.silent_async([N, &res1](){ res1 = fibonacci(N-1); }); res2 = fibonacci(N-2); // cooperatively run tasks until all tasks spawned by `tg` complete tg.corun(); return res1 + res2; } int main() { size_t N = 30, res; res = executor.async([](){ return fibonacci(30); }).get(); std::cout << N << "-th Fibonacci number is " << res << '\n'; return 0; }