Cookbook » Runtime Tasking

Taskflow allows tasks to interact with the scheduling runtime by taking a runtime object as an argument. This feature is particularly useful for implementing recursive parallel algorithms that create tasks dynamically.

Create a Runtime Task

A runtime task is a special type of task that interacts directly with the Taskflow scheduling runtime through a tf::Runtime object. By taking a tf::Runtime& parameter, a runtime task can dynamically create and manage other tasks during its execution. This capability enables flexible and expressive patterns such as recursive parallelism, cooperative task execution, and dynamic task spawning based on runtime conditions. For example, the code below creates a runtime task and acquires its running executor via tf::Runtime::executor():

tf::Executor executor;
tf::Taskflow taskflow;
taskflow.emplace([&](tf::Runtime& rt){  // a runtime task
  assert(&(rt.executor()) == &executor);
});
executor.run(taskflow).wait();

Create Asynchronous Tasks from a Runtime Task

Similar to tf::Executor, tf::Runtime allows you to create asynchronous tasks using tf::Runtime::async or tf::Runtime::silent_async. Asynchronous tasks spawned from a runtime task are logically parented to that runtime and are implicitly synchronized at the end of the runtime's scope. For example, the code below creates a runtime task A that spawns 1,000 asynchronous tasks during its execution. These 1,000 asynchronous tasks will be implicitly joined before task A completes, ensuring that all of them finish before the runtime proceeds to task B.

tf::Executor executor(num_threads);
tf::Taskflow taskflow;
std::atomic<size_t> counter(0);

tf::Task A = taskflow.emplace([&](tf::Runtime& rt){
  // spawn 1000 asynchronous tasks from this runtime task
  for(size_t i=0; i<1000; i++) {
    rt.silent_async([&](){ counter.fetch_add(1, std::memory_order_relaxed); });
  }
  // implicit synchronization at the end of the runtime scope
});
tf::Task B = taskflow.emplace([&](){
  assert(counter == 1000);
});
A.precede(B);
executor.run(taskflow).wait();

In addition to tf::Runtime::async and tf::Runtime::silent_async, you can create a dynamic task graph from a tf::Runtime using tf::Runtime::dependent_async and tf::Runtime::silent_dependent_async. For example, the code below creates a linear chain of dependent-async tasks from the execution context of a runtime:

tf::Executor executor;
tf::Taskflow taskflow;
taskflow.emplace([&, counter=int{0}](tf::Runtime& rt) mutable {
  tf::AsyncTask prev = rt.silent_dependent_async([&](){ 
    assert(counter == 0); 
    ++counter; 
  });
  for(size_t i=1; i<=1000; ++i) {
    tf::AsyncTask curr = rt.silent_dependent_async([&, i](tf::Runtime&){ 
      assert(counter == i); 
      ++counter; 
    }, prev);
    prev = curr;
  } 
});  // all dependent-async tasks will implicitly join at the runtime's end scope
executor.run(taskflow).wait();

Synchronize with Cooperative Execution

The tf::Runtime::corun method enables cooperative execution within a runtime task. It explicitly waits for all tasks spawned through the same tf::Runtime object to complete before continuing execution. This is particularly useful when a runtime task launches multiple subtasks that must finish before the next phase of computation. Similar to tf::Executor::corun, the calling worker does not block; instead, it continues to execute available tasks from the executor, cooperating with other workers through the work-stealing loop.

tf::Executor executor(num_threads);
tf::Taskflow taskflow;
std::atomic<size_t> counter(0);

tf::Task A = taskflow.emplace([&](tf::Runtime& rt){
  // spawn 1000 asynchronous tasks from this runtime task
  for(size_t i=0; i<1000; i++) {
    rt.silent_async([&](){ counter.fetch_add(1, std::memory_order_relaxed); });
  }
  // explicitly synchronize all asynchronous tasks without blocking the calling worker
  rt.corun();
  assert(counter == 1000);

  // spawn another 1000 asynchronous tasks from this runtime task
  for(size_t i=0; i<1000; i++) {
    rt.silent_async([&](){ counter.fetch_add(1, std::memory_order_relaxed); });
  }
  // implicit synchronization at the end of the runtime scope
});
tf::Task B = taskflow.emplace([&](){
  assert(counter == 2000);
});
A.precede(B);
executor.run(taskflow).wait();

In this example, all 100 asynchronous tasks spawned by the runtime are guaranteed to complete before the call to corun returns. This cooperative execution model is especially useful for expressing multi-phase or hierarchical parallel algorithms that require intermediate synchronization within a single runtime task.

Create Recursive Asynchronous Tasks from a Runtime Task

Each asynchronous task can itself be a runtime task, enabling recursive task creation and dynamic parallelism. This model is particularly powerful for implementing divide-and-conquer algorithms, such as parallel sort, graph traversal, and recursion. For instance, the example below demonstrates a parallel recursive implementation of Fibonacci numbers using recursive asynchronous tasking with tf::Runtime:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

  if(N < 2) return N; 

  size_t res1, res2;
  rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); });
  
  // tail optimization for the right child
  res2 = fibonacci(N-2, rt);

  // use corun to avoid blocking the worker from waiting children tasks to finish
  rt.corun();

  return res1 + res2;
}

int main() {

  tf::Executor executor;
  
  size_t N = 5, res;
  executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); });
  executor.wait_for_all();

  std::cout << N << "-th Fibonacci number is " << res << '\n';

  return 0;
}

The figure below shows the execution diagram, where the task with suffix *_1 represents the left child spawned by its parent runtime.

Fibonacci F4 fibonacci(4) [rt] F3_1 fibonacci(3) [rt1] F4->F3_1 F2_2 fibonacci(2) [rt] F4->F2_2 F2_1 fibonacci(2) [rt1_1] F3_1->F2_1 F1_2 fibonacci(1) [rt1] F3_1->F1_2 F1_1 fibonacci(1) [rt1_1_1] F2_1->F1_1 F0_1 fibonacci(0) [rt1_1] F2_1->F0_1 F1_3 fibonacci(1) [rt1] F2_2->F1_3 F0_2 fibonacci(0) [rt] F2_2->F0_2

For more details, please refer to Asynchronous Tasking and Fibonacci Number.