Taskflow provides mechanisms to launch tasks asynchronously, enabling dynamic parallelism that goes beyond static task graphs.
An async task is a callable object submitted for execution without being embedded in a pre-defined task graph. Unlike regular taskflow tasks whose dependencies are declared upfront, async tasks are created and dispatched on the fly, making them suitable for dynamic, recursive, or data-dependent parallelism that cannot be fully determined at graph construction time.
The C++ standard library provides std::async for this purpose. However, std::async has fundamental limitations that make it ill-suited for high-performance parallel programs:
The three core problems with std::async are:
std::async typically creates a brand-new OS thread, incurring significant creation and destruction overhead. Spawning hundreds of async tasks means hundreds of threads competing for CPU time.std::async tasks. If one task finishes early, its thread sits idle rather than picking up work from overloaded threads.std::async tasks are isolated from one another. You cannot express dependencies between them, embed them in a larger task graph, or coordinate them with other parallel work.Taskflow's async tasking addresses all three problems. Async tasks run on the executor's existing thread pool under the same work-stealing scheduler, integrate naturally with taskflows and runtimes, and can be launched from any thread without additional overhead.
tf::Executor::async runs a callable asynchronously on the thread pool and returns a std::future that will eventually hold the result:
If you do not need the return value or do not require a std::future for synchronisation, use tf::Executor::silent_async instead. It returns nothing and incurs less overhead than tf::Executor::async, as it avoids the cost of managing a shared state:
Both tf::Executor::async and tf::Executor::silent_async are thread-safe and can be called from any thread — including worker threads already running inside the executor and external threads outside of it. The scheduler automatically detects the submission source and applies work-stealing to distribute the task efficiently across workers:
tf::Runtime::async and tf::Runtime::silent_async let you launch async tasks from within a running task that has access to a tf::Runtime object. Like their executor counterparts, both methods are thread-safe and can be called from any context within the runtime's scope.
Unlike executor-level async tasks, tasks created from a runtime belong to that runtime and are implicitly joined at the end of its scope — meaning all async tasks spawned inside a runtime are guaranteed to finish before the runtime completes and control returns to the next task in the graph.
The example below spawns 100 async tasks from a runtime. Because of the implicit join, task B is guaranteed to see counter == 100:
Launching async tasks from a runtime is the key enabler for dynamic parallel algorithms — parallel reduction, divide-and-conquer, and recursive patterns — that need to create work at runtime rather than at graph construction time.
Async tasks spawned from a runtime can themselves accept a tf::Runtime reference, allowing them to recursively spawn further async tasks. Combined with tf::Runtime::corun, this enables fork-join style divide-and-conquer parallelism where each level of recursion fans out work to available workers without blocking any thread.
The example below implements parallel Fibonacci using recursive async tasking:
rt.corun() without arguments waits for all async tasks spawned within the current runtime scope to complete, without blocking the underlying worker thread from executing other work in the meantime. This is what allows the recursive pattern to scale efficiently — a blocked worker can participate in executing the spawned children rather than idling.The figure below shows the execution diagram for fibonacci(4). The suffix _1 denotes the left child spawned by its parent runtime: