Loading...
Searching...
No Matches
Asynchronous Tasking

Taskflow provides mechanisms to launch tasks asynchronously, enabling dynamic parallelism that goes beyond static task graphs.

What is an Async Task?

An async task is a callable object submitted for execution without being embedded in a pre-defined task graph. Unlike regular taskflow tasks whose dependencies are declared upfront, async tasks are created and dispatched on the fly, making them suitable for dynamic, recursive, or data-dependent parallelism that cannot be fully determined at graph construction time.

The C++ standard library provides std::async for this purpose. However, std::async has fundamental limitations that make it ill-suited for high-performance parallel programs:

// std::async typically spawns a new OS thread for each call
std::future<int> f1 = std::async(std::launch::async, []() { return 1; });
std::future<int> f2 = std::async(std::launch::async, []() { return 2; });
std::future<int> f3 = std::async(std::launch::async, []() { return 3; });
// ... spawning N tasks creates N threads, each with its own stack and
// OS overhead — expensive to create, destroy, and context-switch

The three core problems with std::async are:

  • No thread pool: each call to std::async typically creates a brand-new OS thread, incurring significant creation and destruction overhead. Spawning hundreds of async tasks means hundreds of threads competing for CPU time.
  • No scheduler: there is no work-stealing or load balancing between std::async tasks. If one task finishes early, its thread sits idle rather than picking up work from overloaded threads.
  • No task graph integration: std::async tasks are isolated from one another. You cannot express dependencies between them, embed them in a larger task graph, or coordinate them with other parallel work.

Taskflow's async tasking addresses all three problems. Async tasks run on the executor's existing thread pool under the same work-stealing scheduler, integrate naturally with taskflows and runtimes, and can be launched from any thread without additional overhead.

Launch Async Tasks from an Executor

tf::Executor::async runs a callable asynchronously on the thread pool and returns a std::future that will eventually hold the result:

std::future<int> future = executor.async([](){ return 1; });
assert(future.get() == 1);

If you do not need the return value or do not require a std::future for synchronisation, use tf::Executor::silent_async instead. It returns nothing and incurs less overhead than tf::Executor::async, as it avoids the cost of managing a shared state:

executor.silent_async([](){});

Both tf::Executor::async and tf::Executor::silent_async are thread-safe and can be called from any thread — including worker threads already running inside the executor and external threads outside of it. The scheduler automatically detects the submission source and applies work-stealing to distribute the task efficiently across workers:

tf::Task my_task = taskflow.emplace([&]() {
// launch an async task from a worker thread inside the executor
executor.async([&]() {
// launch another async task from yet another worker thread
executor.async([&]() {});
});
});
executor.run(taskflow);
executor.wait_for_all();
class to create a task handle over a taskflow node
Definition task.hpp:263
Note
Async tasks created from an executor do not belong to any taskflow. Their lifetime is automatically managed by the executor.

Launch Async Tasks from a Runtime

tf::Runtime::async and tf::Runtime::silent_async let you launch async tasks from within a running task that has access to a tf::Runtime object. Like their executor counterparts, both methods are thread-safe and can be called from any context within the runtime's scope.

Unlike executor-level async tasks, tasks created from a runtime belong to that runtime and are implicitly joined at the end of its scope — meaning all async tasks spawned inside a runtime are guaranteed to finish before the runtime completes and control returns to the next task in the graph.

The example below spawns 100 async tasks from a runtime. Because of the implicit join, task B is guaranteed to see counter == 100:

tf::Taskflow taskflow;
tf::Executor executor;
std::atomic<int> counter{0};
tf::Task A = taskflow.emplace([&](tf::Runtime& rt) {
for(int i = 0; i < 100; i++) {
rt.silent_async([&](){ ++counter; });
}
}); // implicit join: all 100 tasks finish before A completes
tf::Task B = taskflow.emplace([&]() {
assert(counter == 100);
});
A.precede(B);
executor.run(taskflow).wait();
class to create an executor
Definition executor.hpp:62
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:1435
class to create a runtime task
Definition runtime.hpp:47
void silent_async(F &&f)
runs the given function asynchronously without returning any future object
Definition runtime.hpp:671
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:952
class to create a taskflow object
Definition taskflow.hpp:64

Launching async tasks from a runtime is the key enabler for dynamic parallel algorithms — parallel reduction, divide-and-conquer, and recursive patterns — that need to create work at runtime rather than at graph construction time.

Launch Async Tasks Recursively from a Runtime

Async tasks spawned from a runtime can themselves accept a tf::Runtime reference, allowing them to recursively spawn further async tasks. Combined with tf::Runtime::corun, this enables fork-join style divide-and-conquer parallelism where each level of recursion fans out work to available workers without blocking any thread.

The example below implements parallel Fibonacci using recursive async tasking:

#include <taskflow/taskflow.hpp>
size_t fibonacci(size_t N, tf::Runtime& rt) {
if(N < 2) return N;
size_t res1, res2;
// spawn the left child asynchronously
rt.silent_async([N, &res1](tf::Runtime& rt1) {
res1 = fibonacci(N-1, rt1);
});
// compute the right child inline (tail optimisation)
res2 = fibonacci(N-2, rt);
// wait for all async children without blocking the worker thread
rt.corun();
return res1 + res2;
}
int main() {
tf::Executor executor;
size_t N = 5, res;
executor.silent_async([N, &res](tf::Runtime& rt) {
res = fibonacci(N, rt);
});
executor.wait_for_all();
std::cout << N << "-th Fibonacci number is " << res << '\n';
return 0;
}
void silent_async(P &&params, F &&func)
similar to tf::Executor::async but does not return a future object
void wait_for_all()
waits for all tasks to complete
void corun()
corun all tasks spawned by this runtime with other workers
Definition runtime.hpp:646
Note
rt.corun() without arguments waits for all async tasks spawned within the current runtime scope to complete, without blocking the underlying worker thread from executing other work in the meantime. This is what allows the recursive pattern to scale efficiently — a blocked worker can participate in executing the spawned children rather than idling.

The figure below shows the execution diagram for fibonacci(4). The suffix _1 denotes the left child spawned by its parent runtime: