Loading...
Searching...
No Matches
Asynchronous Tasking with Dependencies

Taskflow supports creating task graphs dynamically using dependent async tasks so you can handle more challenging parallel problems in a dynamic environment. This type of task graph construction is referred to as dynamic task graph programming (DTGP). We recommend reading Asynchronous Tasking before this page.

When Static Task Graphs Are Not Enough

The standard Taskflow model is construct-then-run: you build the entire task graph upfront with tf::Taskflow, then hand it to tf::Executor to execute. This model is also referred to as static task graph programming (STGP), which is clean, predictable, and efficient for workloads whose structure is known before execution begins. However, two scenarios of problems cannot be handled well by STGP, explained below:

Scenario A: Graph topology determined at runtime

Consider a workflow where the structure of the task graph — how many sub-graphs exist, which ones run in parallel, which depend on which — is decided entirely by runtime conditions and properties of the graphs themselves:

auto G1 = build_task_graph_1();
auto G2 = build_task_graph_2();
if(G1.num_tasks() == 100) {
// simple case: G2 runs after G1
G1.precede(G2);
}
else {
// complex case: G3 runs alongside G2, both after G1
auto G3 = build_task_graph_3();
G1.precede(G2, G3);
if(G2.num_dependencies() >= 10) {
// G2 is heavily connected — funnel into a single post-processing step
auto G4 = build_task_graph_4();
G2.precede(G4);
G3.precede(G4);
}
else {
// G2 is lightly connected — fan out into two independent steps
auto G5 = build_task_graph_5();
auto G6 = build_task_graph_6();
G3.precede(G5, G6);
}
}

Building this statically would require enumerating every possible branch as a separate pre-built taskflow and selecting one at program start. That approach is brittle, wasteful, and breaks down completely when the branching logic depends on properties of the graphs themselves — as shown here, where the structure of G1 and G2 determines what runs next. Dynamic task graph programming solves this directly: sub-graphs are created and wired as control flow unfolds, so the final graph matches the actual execution path exactly.

Scenario B — Hiding graph construction latency

In large graphs, constructing every task node can itself take non-trivial time — allocating buffers, loading metadata, resolving file paths. With construct-then-run, all of that setup must complete before a single task begins executing. With dynamic task graph programming, a task can begin executing the moment its dependencies are satisfied, even while downstream tasks are still being constructed. This overlap between graph creation and task execution can significantly reduce end-to-end latency.

The figure below illustrates this difference on a four-task graph. In the static model, the entire taskflow is constructed before any task runs. In the dynamic model, execution of early tasks overlaps with the construction of later tasks:

Taskflow's dependent-async API, tf::Executor::dependent_async and tf::Executor::silent_dependent_async, is designed precisely for these scenarios. Each task is submitted individually with an explicit list of predecessor tasks, and the executor begins running it as soon as all predecessors complete, without waiting for the rest of the graph to be defined.

Create a Dynamic Task Graph from an Executor

tf::Executor::silent_dependent_async and tf::Executor::dependent_async create a dependent-async task of type tf::AsyncTask and schedule it for execution as soon as its dependencies are satisfied. tf::Executor::dependent_async additionally returns a std::future that eventually holds the result of the callable.

The example below dynamically creates the following diamond task graph, where A runs first, B and C run in parallel after A, and D runs after both B and C:

tf::Executor executor;
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
auto [D, fuD] = executor.dependent_async([](){ printf("D\n"); }, B, C);
fuD.get(); // waiting for D implies A, B, C have all finished
class to hold a dependent asynchronous task with shared ownership
Definition async_task.hpp:45
class to create an executor
Definition executor.hpp:62
tf::AsyncTask silent_dependent_async(F &&func, Tasks &&... tasks)
runs the given function asynchronously when the given predecessors finish
auto dependent_async(F &&func, Tasks &&... tasks)
runs the given function asynchronously when the given predecessors finish

Because task execution begins as soon as dependencies are met, this model requires you to express tasks in a valid topological order — you can only name a task as a predecessor after it has already been created. For the diamond above there are two valid orderings; the alternative is:

tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
auto [D, fuD] = executor.dependent_async([](){ printf("D\n"); }, B, C);
fuD.get();

In addition to synchronising on a specific task via its future, you can wait for all outstanding dependent-async tasks using tf::Executor::wait_for_all:

tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);
executor.wait_for_all();
void wait_for_all()
waits for all tasks to complete

Specify a Range of Dependencies

Both tf::Executor::dependent_async and tf::Executor::silent_dependent_async accept an arbitrary number of predecessor tasks as variadic arguments. When the number of predecessors is not known until runtime — for example, when it depends on the size of a data set — you can use the iterator overloads that accept a range [first, last):

The iterator's dereferenced type must be convertible to tf::AsyncTask. The example below creates a final task that depends on N previously created tasks, where N is a runtime variable:

tf::Executor executor;
std::vector<tf::AsyncTask> predecessors;
for(size_t i = 0; i < N; i++) {
predecessors.push_back(executor.silent_dependent_async([](){}));
}
// this task runs after all N predecessors have completed
executor.silent_dependent_async([](){},
predecessors.begin(), predecessors.end()
);
executor.wait_for_all();

Create a Dynamic Task Graph from a Runtime

You can also create dependent-async tasks from within a running task that has access to a tf::Runtime object, using tf::Runtime::dependent_async and tf::Runtime::silent_dependent_async. The API mirrors the executor-level interface, but with one important distinction: all dependent-async tasks spawned from a runtime are parented to that runtime and are implicitly joined at the end of its scope. This means the runtime task does not complete — and control does not pass to the next task in the graph — until every dependent-async task it spawned has finished. This property is especially useful for implementing dynamic sub-graphs inside a larger static graph: a single runtime task can build and run an entire dynamic task graph as part of one logical step, with the surrounding graph remaining unaware of the internal structure.

The example below shows a static graph where task A dynamically builds a diamond sub-graph at runtime. Task B is guaranteed to see the results of the entire sub-graph because the implicit join ensures all sub-tasks finish before A completes:

tf::Taskflow taskflow;
tf::Executor executor;
std::atomic<int> counter{0};
tf::Task A = taskflow.emplace([&](tf::Runtime& rt) {
// dynamically build a diamond sub-graph inside A
tf::AsyncTask a = rt.silent_dependent_async([&](){ ++counter; });
tf::AsyncTask b = rt.silent_dependent_async([&](){ ++counter; }, a);
tf::AsyncTask c = rt.silent_dependent_async([&](){ ++counter; }, a);
rt.silent_dependent_async([&](){ ++counter; }, b, c);
// implicit join: all four sub-tasks finish before A completes
});
tf::Task B = taskflow.emplace([&]() {
assert(counter == 4); // guaranteed: A's sub-graph has fully completed
});
A.precede(B);
executor.run(taskflow).wait();
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:1558
class to create a runtime task
Definition runtime.hpp:47
tf::AsyncTask silent_dependent_async(F &&func, Tasks &&... tasks)
runs the given function asynchronously when the given predecessors finish
Definition runtime.hpp:710
class to create a task handle over a taskflow node
Definition task.hpp:263
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:952
class to create a taskflow object
Definition taskflow.hpp:64
Note
Dependent-async tasks created from a runtime belong to that runtime and are automatically joined when the runtime goes out of scope. In contrast, dependent-async tasks created from an executor have no parent and must be explicitly synchronised via a future or tf::Executor::wait_for_all.

Create a Dynamic Task Graph from Multiple Threads

Since tf::Executor::dependent_async and tf::Executor::silent_dependent_async are thread-safe, multiple threads can collaborate to build the same dynamic task graph concurrently, provided the overall topological order is respected. The example below uses three threads to build a graph where B and C both depend on A:

tf::Executor executor;
// main thread creates task A
// two threads each add a dependent task
std::thread t1([&](){
tf::AsyncTask B = executor.silent_dependent_async([](){}, A);
});
std::thread t2([&](){
tf::AsyncTask C = executor.silent_dependent_async([](){}, A);
});
executor.wait_for_all();
t1.join();
t2.join();

Regardless of whether t1 runs before or after t2, both orderings (ABC or ACB) satisfy the dependency that B and C follow A.

Understand the Lifetime of a Dependent-Async Task

tf::AsyncTask is a lightweight handle that holds shared ownership of the underlying task object. This shared ownership ensures the task remains alive when it is added to the dependency list of another task, preventing the ABA problem that would arise if the task were destroyed before its dependents had been registered:

// main thread retains shared ownership of A
assert(A.use_count() >= 1);
// A remains alive while being registered as a predecessor of B
tf::AsyncTask B = executor.silent_dependent_async([](){}, A);
assert(B.use_count() >= 1);
size_t use_count() const
returns the number of shared owners that are currently managing this dependent-async task
Definition async_task.hpp:284

tf::AsyncTask is implemented in a similar way to std::shared_ptr and is cheap to copy or move. When a worker finishes executing a dependent-async task, it removes the task from the executor, decrementing the shared owner count by one. The task is destroyed when that count reaches zero.

Query Completion Status with Cooperative Execution

tf::AsyncTask::is_done returns true once the task has finished executing its callable, and false before that point. This is useful when you need to check whether a specific task has completed before proceeding, without blocking the calling thread. Consider a scenario where a main thread submits a chain of data-processing tasks and needs to verify the results of an intermediate stage before deciding what to submit next:

tf::Executor executor;
// stage 1: load and parse data
auto [parse, fu_parse] = executor.dependent_async([]() -> int {
return load_and_parse(); // returns the number of records parsed
});
// stage 2: validate records — depends on parse
auto [validate, fu_validate] = executor.dependent_async([&]() -> bool {
return validate_records();
}, parse);
// the main thread keeps the executor's workers alive (work-stealing loop)
// while waiting for both stages to complete
executor.corun_until([&](){
return parse.is_done() && validate.is_done();
});
// now safe to inspect results and decide on the next step
int n_records = fu_parse.get();
bool valid = fu_validate.get();
if(valid) {
// submit stage 3 only if validation passed
executor.silent_dependent_async([=]() {
process_records(n_records);
}, validate);
}
executor.wait_for_all();
void corun_until(P &&predicate)
keeps running the work-stealing loop until the predicate returns true
Note
tf::AsyncTask::is_done is designed to be used together with tf::Executor::corun_until, which keeps the calling worker thread active in the work-stealing loop rather than blocking it. Blocking a worker thread with a spin-wait or std::future::get while inside the executor can cause deadlock if all workers are blocked waiting for tasks that cannot be scheduled. See Execute a Taskflow from an Internal Worker Cooperatively for more details.