Loading...
Searching...
No Matches
Fibonacci Number

We study recursive task parallelism using the Fibonacci sequence as a concrete example, demonstrating how tf::Runtime enables dynamic task creation and cooperative synchronisation without blocking worker threads.

Problem Formulation

The Fibonacci sequence is defined such that each number is the sum of the two preceding ones, starting from 0 and 1:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

The natural recursive implementation is:

int fibonacci(int n) {
if(n < 2) return n;
return fibonacci(n-1) + fibonacci(n-2);
}

The two recursive calls are mutually independent and can therefore run in parallel. The challenge is to express this recursive parallelism efficiently — spawning tasks that themselves spawn tasks — without blocking worker threads or creating unbounded overhead.

Recursive Fibonacci with Runtime Tasking

A runtime task accepts a tf::Runtime& parameter, giving it a live handle to the scheduling runtime. We use tf::Runtime::silent_async to spawn both recursive branches in parallel and tf::Runtime::corun to wait for them cooperatively. corun does not block the calling worker — it participates in the work-stealing loop while waiting, so other tasks can run on the same thread:

#include <taskflow/taskflow.hpp>
size_t fibonacci(size_t N, tf::Runtime& rt) {
if(N < 2) return N;
size_t res1, res2;
// spawn both branches as async runtime tasks
rt.silent_async([N, &res1](tf::Runtime& rt1) {
res1 = fibonacci(N-1, rt1);
});
rt.silent_async([N, &res2](tf::Runtime& rt2) {
res2 = fibonacci(N-2, rt2);
});
// cooperatively wait for both branches without blocking the worker
rt.corun();
return res1 + res2;
}
int main() {
tf::Executor executor;
size_t N = 30, res;
executor.silent_async([N, &res](tf::Runtime& rt) {
res = fibonacci(N, rt);
});
executor.wait_for_all();
std::cout << N << "-th Fibonacci number is " << res << '\n';
return 0;
}
void silent_async(P &&params, F &&func)
similar to tf::Executor::async but does not return a future object
void wait_for_all()
waits for all tasks to complete
class to create a runtime task
Definition runtime.hpp:47
void silent_async(F &&f)
runs the given function asynchronously without returning any future object
Definition runtime.hpp:671
void corun()
corun all tasks spawned by this runtime with other workers
Definition runtime.hpp:646

Each call to fibonacci recursively spawns two async tasks and then waits cooperatively for both to finish. The executor distributes tasks across all available workers using work-stealing, achieving parallelism at every level of the recursion tree. The execution diagram for fibonacci(4) is shown below. The suffixes _1 and _2 denote the left and right children spawned by their parent runtime:

Tail Recursion Optimisation

Spawning both branches asynchronously doubles the number of async tasks created: one branch is always available for inline computation in the current execution context, so spawning it asynchronously only adds scheduling overhead without increasing parallelism. The standard optimisation is to compute one branch inline (directly in the current context) and spawn only the other asynchronously. This halves the task count, reduces stack pressure, and cuts scheduling overhead at every level of recursion:

size_t fibonacci(size_t N, tf::Runtime& rt) {
if(N < 2) return N;
size_t res1, res2;
// spawn only the left branch asynchronously
rt.silent_async([N, &res1](tf::Runtime& rt1) {
res1 = fibonacci(N-1, rt1);
});
// compute the right branch inline in the current context
res2 = fibonacci(N-2, rt);
// cooperatively wait for the left branch
rt.corun();
return res1 + res2;
}

The optimised execution diagram for fibonacci(4) is shown below. The right branch at each level has been eliminated, replaced by inline computation:

Benchmarking

The table below compares wall-clock runtime of the two implementations across different values of N on a multi-core machine:

N with tail optimisation without tail optimisation
20 0.23 ms 0.31 ms
25 2 ms 4 ms
30 23 ms 42 ms
35 269 ms 483 ms
40 3003 ms 5124 ms

The performance gap widens as N increases because the number of tasks created grows exponentially. At N = 40, tail optimisation cuts runtime by over 40% by eliminating half the task-creation and scheduling overhead at every recursive level.

Recursive Fibonacci with Task Group

tf::TaskGroup offers a lighter-weight alternative to tf::Runtime for recursive parallelism from any calling context — no need for the caller itself to be a runtime task. The interface is the same: spawn sub-tasks with silent_async, then call corun to wait cooperatively.

tf::Executor executor;
std::function<size_t(size_t)> fibonacci = [&](size_t N) -> size_t {
if(N < 2) return N;
size_t res1, res2;
tf::TaskGroup tg = executor.task_group();
// spawn left branch; compute right branch inline (tail optimisation)
tg.silent_async([N, &res1, &fibonacci]() {
res1 = fibonacci(N-1);
});
res2 = fibonacci(N-2);
tg.corun(); // cooperatively wait for the left branch
return res1 + res2;
};
int main() {
size_t N = 30;
size_t res = executor.async([&]() { return fibonacci(N); }).get();
std::cout << N << "-th Fibonacci number is " << res << '\n';
return 0;
}
class to create an executor
Definition executor.hpp:62
TaskGroup task_group()
creates a task group that executes a collection of asynchronous tasks
Definition task_group.hpp:875
auto async(P &&params, F &&func)
creates a parameterized asynchronous task to run the given function
class to create a task group from a task
Definition task_group.hpp:61
void corun()
corun all tasks spawned by this task group with other workers
Definition task_group.hpp:725
void silent_async(F &&f)
runs the given function asynchronously without returning any future object
Definition task_group.hpp:756
Note
Prefer tf::Runtime when you are already inside a runtime task, as it avoids the construction overhead of a task group object. Use tf::TaskGroup when the recursive function is called from a context that does not hold a tf::Runtime reference.