Cookbook » Runtime Tasking

Taskflow allows you to interact with the scheduling runtime by taking a runtime object as an argument of a task. This is mostly useful for designing recursive parallel algorithms that require dynamic tasking on the fly.

Create a Runtime Task

Taskflow allows users to define a runtime task that accepts a reference to a tf::Runtime object. This object provides methods to interact with the underlying scheduling engine. For example, a runtime task can be used to explicitly schedule another task that would not normally execute due to the graph's structure or conditional dependencies:

tf::Task A, B, C, D;
std::tie(A, B, C, D) = taskflow.emplace(
  [] () { return 0; },
  [&C] (tf::Runtime& rt) {  // C must be captured by reference
    std::cout << "B\n"; 
    rt.schedule(C);
  },
  [] () { std::cout << "C\n"; },
  [] () { std::cout << "D\n"; }
);
A.precede(B, C, D);
executor.run(taskflow).wait();
Taskflow p0x7bc400014030 D p0x7bc400014118 C p0x7bc400014200 B p0x7bc4000142e8 A p0x7bc4000142e8->p0x7bc400014030 2 p0x7bc4000142e8->p0x7bc400014118 1 p0x7bc4000142e8->p0x7bc400014200 0

In the above code, when the condition task A completes and returns 0, the scheduler moves on to task B. Under normal circumstances, tasks C and D will not run because their conditional dependencies never occur. This behavior can be overridden by forcefully scheduling C or/and D via a runtime object of a task that resides in the same graph. Here, task B calls tf::Runtime::schedule to forcefully run task C, even though the weak dependency between A and C will never occur based on the graph structure itself. As a result, we will see both B and C in the output:

B    # B uses a runtime object to schedule C out of its dependency constraint
C

Acquire the Running Executor

You can acquire the reference to the running executor using tf::Runtime::executor. The executor associated with a runtime object is the executor that runs the parent task of that runtime object.

tf::Executor executor;
tf::Taskflow taskflow;
taskflow.emplace([&](tf::Runtime& rt){
  assert(&(rt.executor()) == &executor);
});
executor.run(taskflow).wait();

Corun Taskflows from a Runtime Task

One of the most powerful features of a runtime task is tf::Runtime::corun. The method tf::Runtime::corun provides a non-blocking mechanism that allows the calling worker to continue executing other available tasks in the executor while waiting for all tasks spawned from that runtime to complete. This behavior is critical for avoiding deadlock in nested or recursive tasking patterns, where workers may otherwise block while waiting on subgraphs of children tasks to finish, leading to a situation where no workers are left to make forward progress. The following example demonstrates how to use tf::Runtime::corun to run a predefined task graph during the execution of a runtime task, without blocking the calling worker:

// create a custom graph
tf::Taskflow graph;
graph.emplace([](){ std::cout << "independent task 1\n"; });
graph.emplace([](){ std::cout << "independent task 2\n"; });

taskflow.emplace([&](tf::Runtime& rt){ 
  // coruns the graph without blocking the calling worker of this runtime
  rt.corun(graph);
});
executor.run_n(taskflow, 10000);

Although tf::Runtime::corun does not return control to the program until the given graph finishes its execution, the calling worker (i.e., parent worker) of the runtime indeed joins the executor's work-stealing loop and continues executing other tasks together with graph execution. This behavior differs from waiting on a submitted taskflow using std::future<T>::wait (i.e., base class of tf::Future), which blocks the calling thread entirely until completion. If multiple taskflows are submitted and waited on in this blocking manner, it can potentially lead to deadlock, especially in recursive or nested patterns. For example, the code below submits a taskflow of 1000 tasks to an executor of two workers, where each worker blocks while waiting on another taskflow of 500 tasks, causing deadlock:

tf::Executor executor(2);
tf::Taskflow taskflow;
std::array<tf::Taskflow, 1000> others;

for(size_t n=0; n<1000; n++) {
  for(size_t i=0; i<500; i++) {
    others[n].emplace([&](){});
  }
  taskflow.emplace([&executor, &tf=others[n]](){
    // blocking the worker can introduce deadlock where
    // all workers are waiting for their taskflows to finish
    executor.run(tf).wait();
  });
}
executor.run(taskflow).wait();

To avoid deadlock, you should instead use tf::Runtime::corun that allows the calling worker to corun these taskflows without blocking its execution, thereby avoiding deadlocks.

tf::Executor executor(2);
tf::Taskflow taskflow;
std::array<tf::Taskflow, 1000> others;

for(size_t n=0; n<1000; n++) {
  for(size_t i=0; i<500; i++) {
    others[n].emplace([&](){});
  }
  taskflow.emplace([&tf=others[n]](tf::Runtime& rt){
    // the caller worker will not block on wait but corun these
    // taskflows through its work-stealing loop
    rt.corun(tf);
  });
}
executor.run(taskflow).wait();

Corun Asynchronous Tasks from a Runtime Task

Similar to tf::Executor, tf::Runtime allows you to create asynchronous tasks on the fly using tf::Runtime::async or tf::Runtime::silent_async. Asynchronous tasks spawned from a runtime task are logically parented to that runtime and can be explicitly synchronized using tf::Runtime::corun. Furthermore, each asynchronous task can itself be a runtime task, enabling recursive task creation and dynamic parallelism. This model is particularly powerful for implementing divide-and-conquer algorithms, such as parallel sort, graph traversal, and recursion. For instance, the example below demonstrates a parallel recursive implementation of Fibonacci numbers using recursive asynchronous tasking with tf::Runtime:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

  if(N < 2) return N; 

  size_t res1, res2;
  rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); });
  
  // tail optimization for the right child
  res2 = fibonacci(N-2, rt);

  // use corun to avoid blocking the worker from waiting children tasks to finish
  rt.corun();

  return res1 + res2;
}

int main() {

  tf::Executor executor;
  
  size_t N = 5, res;
  executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); });
  executor.wait_for_all();

  std::cout << N << "-th Fibonacci number is " << res << '\n';

  return 0;
}

The figure below shows the execution diagram, where the task with suffix *_1 represents the left child spawned by its parent runtime.

Fibonacci F4 fibonacci(4) [rt] F3_1 fibonacci(3) [rt1] F4->F3_1 F2_2 fibonacci(2) [rt] F4->F2_2 F2_1 fibonacci(2) [rt1_1] F3_1->F2_1 F1_2 fibonacci(1) [rt1] F3_1->F1_2 F1_1 fibonacci(1) [rt1_1_1] F2_1->F1_1 F0_1 fibonacci(0) [rt1_1] F2_1->F0_1 F1_3 fibonacci(1) [rt1] F2_2->F1_3 F0_2 fibonacci(0) [rt] F2_2->F0_2

For more details, please refer to Asynchronous Tasking and Fibonacci Number.