Cookbook » Runtime Tasking

Taskflow allows you to interact with the scheduling runtime by taking a runtime object as an argument of a task. This is mostly useful for designing recursive parallel algorithms that require dynamic tasking on the fly.

Create a Runtime Object

Taskflow allows users to define a runtime task that takes a referenced tf::Runtime object, which provides a set of methods to interact with the underlying scheduling runtime. The following example demonstrates a runtime task can explicitly schedule a condition task, which would not be executed under normal scheduling circumstances:

tf::Task A, B, C, D;
std::tie(A, B, C, D) = taskflow.emplace(
  [] () { return 0; },
  [&C] (tf::Runtime& rt) {  // C must be captured by reference
    std::cout << "B\n"; 
    rt.schedule(C);
  },
  [] () { std::cout << "C\n"; },
  [] () { std::cout << "D\n"; }
);
A.precede(B, C, D);
executor.run(taskflow).wait();
Taskflow p0x7bc400014030 D p0x7bc400014118 C p0x7bc400014200 B p0x7bc4000142e8 A p0x7bc4000142e8->p0x7bc400014030 2 p0x7bc4000142e8->p0x7bc400014118 1 p0x7bc4000142e8->p0x7bc400014200 0

When the condition task A completes and returns 0, the scheduler moves on to task B. Under the normal circumstance, tasks C and D will not run because their conditional dependencies never happen. This can be broken by forcefully scheduling C or/and D via a runtime object of a task that resides in the same graph. Here, task B call tf::Runtime::schedule to forcefully run task C even though the weak dependency between A and C will never happen based on the graph structure itself. As a result, we will see both B and C in the output:

B    # B leverages a runtime object to schedule C out of its dependency constraint
C

Acquire the Running Executor

You can acquire the reference to the running executor using tf::Runtime::executor(). The executor associated with a runtime object is the executor that runs the parent task of that runtime object.

tf::Executor executor;
tf::Taskflow taskflow;
taskflow.emplace([&](tf::Runtime& rt){
  assert(&(rt.executor()) == &executor);
});
executor.run(taskflow).wait();

Run a Task Graph Asynchronously

A tf::Runtime object spawn a task graph asynchronously using tf::Runtime::corun. This model enables dynamic tasking, allowing parallel workloads to execute directly within the runtime context. The following example demonstrates how to run a predefined task graph during the execution of a runtime task, without blocking the calling worker using tf::Runtime::corun:

// create a custom graph
tf::Taskflow graph;
graph.emplace([](){ std::cout << "independent task 1\n"; });
graph.emplace([](){ std::cout << "independent task 2\n"; });

taskflow.emplace([&](tf::Runtime& rt){ 
  // coruns the graph without blocking the calling worker of this runtime
  rt.corun(graph);
});
executor.run_n(taskflow, 10000);

Although tf::Runtime::corun does not return control to the program until the given graph finishes its execution, the calling worker (i.e., parent worker) of the runtime indeed joins the executor's work-stealing loop and continues executing other tasks together with graph execution. This behavior differs from waiting on a submitted taskflow using tf::Future<T>::wait, which blocks the calling thread entirely until completion. If multiple taskflows are submitted and waited on in this blocking manner, it can potentially lead to deadlock, especially in recursive or nested patterns. For example, the code below submits a taskflow of 1000 tasks to an executor of two workers, where each worker blocks while waiting on another taskflow of 500 tasks, causing deadlock:

tf::Executor executor(2);
tf::Taskflow taskflow;
std::array<tf::Taskflow, 1000> others;

for(size_t n=0; n<1000; n++) {
  for(size_t i=0; i<500; i++) {
    others[n].emplace([&](){});
  }
  taskflow.emplace([&executor, &tf=others[n]](){
    // blocking the worker can introduce deadlock where
    // all workers are waiting for their taskflows to finish
    executor.run(tf).wait();
  });
}
executor.run(taskflow).wait();

Using tf::Runtime::corun allows each worker to co-run these taskflows without blocking on a wait, thereby avoiding deadlocks.

tf::Executor executor(2);
tf::Taskflow taskflow;
std::array<tf::Taskflow, 1000> others;

for(size_t n=0; n<1000; n++) {
  for(size_t i=0; i<500; i++) {
    others[n].emplace([&](){});
  }
  taskflow.emplace([&tf=others[n]](tf::Runtime& rt){
    // the caller worker will not block on wait but corun these
    // taskflows through its work-stealing loop
    rt.corun(tf);
  });
}
executor.run(taskflow).wait();

Run a Task Asynchronously

One of the most powerful features of tf::Runtime is its ability to launch asynchronous tasks on the fly. You can dynamically create tasks during execution using tf::Runtime::async or tf::Runtime::silent_async. Each asynchronous task can itself be a runtime task, enabling recursive task creation and dynamic parallelism. This model is particularly useful for implementing divide-and-conquer algorithms, such as parallel sort, graph traversal, and other runtime-driven parallel patterns. For instance, the example below demonstrates a parallel recursive implementation of Fibonacci numbers using recursive asynchronous tasking from tf::Runtime:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

  if(N < 2) return N; 

  size_t res1, res2;
  rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); });
  
  // tail optimization for the right child
  res2 = fibonacci(N-2, rt);

  // use corun to avoid blocking the worker from waiting the two children tasks 
  // to finish
  rt.corun();

  return res1 + res2;
}

int main() {

  tf::Executor executor;
  
  size_t N = 5, res;
  executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); });
  executor.wait_for_all();

  std::cout << N << "-th Fibonacci number is " << res << '\n';

  return 0;
}

The figure below shows the execution diagram, where the suffix *_1 represent the left child spawned by its parent runtime.

Fibonacci F4 fibonacci(4) [rt] F3_1 fibonacci(3) [rt1] F4->F3_1 F2_2 fibonacci(2) [rt] F4->F2_2 F2_1 fibonacci(2) [rt1_1] F3_1->F2_1 F1_2 fibonacci(1) [rt1] F3_1->F1_2 F1_1 fibonacci(1) [rt1_1_1] F2_1->F1_1 F0_1 fibonacci(0) [rt1_1] F2_1->F0_1 F1_3 fibonacci(1) [rt1] F2_2->F1_3 F0_2 fibonacci(0) [rt] F2_2->F0_2

For more details, please refer to Asynchronous Tasking and Fibonacci Number.