Cookbook » Asynchronous Tasking

This chapters discusses how to launch tasks asynchronously so that you can incorporate independent, dynamic parallelism in your taskflows.

Launch Asynchronous Tasks from an Executor

Taskflow's executor provides an STL-style method, tf::Executor::async, that allows you to run a callable object asynchronously. This method returns a std::future which will eventually hold the result of the function call.

std::future<int> future = executor.async([](){ return 1; });
assert(future.get() == 1);

If you do not need the return value or do not require a std::future for synchronization, you should use tf::Executor::silent_async. This method returns nothing and incurs less overhead than tf::Executor::async, as it avoids the cost of managing a shared state for std::future.

executor.silent_async([](){});

Launching asynchronous tasks from an executor is thread-safe and can be invoked from multiple threads, including both worker threads inside the executor and external threads outside of it. The scheduler automatically detects the source of the submission and employs work-stealing to schedule the task efficiently, ensuring balanced workload distribution across workers.

tf::Task my_task = taskflow.emplace([&](){
  // launch an asynchronous task from my_task
  executor.async([&](){
    // launch another asynchronous task that may be run by another worker
    executor.async([&](){});
  })
});
executor.run(taskflow);
executor.wait_for_all();   // wait for all tasks to finish

Launch Asynchronous Tasks from a Runtime

You can launch asynchronous tasks from a runtime object (tf::Runtime) using tf::Runtime::async or tf::Runtime::silent_async. Unlike creating asynchronous tasks from an executor, tasks created from a runtime object belong to that runtime and are implicitly joined at the end of the runtime's scope. For example, the code below creates 100 asynchronous tasks from a runtime, and these 100 tasks are guaranteed to finish before the runtime completes.

tf::Taskflow taskflow;
tf::Executor executor;
std::atomic<int> counter{0};

tf::Task A = taskflow.emplace([&] (tf::Runtime& rt){
  for(int i=0; i<100; i++) {
    rt.silent_async([&](){ ++counter; }));
  }
});  // implicit join at the end of the runtime's scope
tf::Task B = taskflow.emplace([](){
  assert(counter == 100);
});
A.precede(B);

executor.run(taskflow).wait();

Creating asynchronous tasks from a runtime enables efficient implementation of recursive parallel algorithms, such as parallel iteration, reduction, and sorting, that require dynamic task creation at runtime.

Launch Asynchronous Tasks Recursively from a Runtime

Asynchronous tasks can also leverage runtime tasking, allowing them to recursively launch additional asynchronous tasks. Combined with tf::Runtime::corun, this enables the implementation of various recursive parallelism patterns, including parallel sort, divide-and-conquer algorithms, and the fork-join model. For instance, the example below demonstrates a parallel recursive implementation of Fibonacci numbers using recursive asynchronous tasking from tf::Runtime:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

  if(N < 2) return N; 

  size_t res1, res2;
  rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); });
  
  // tail optimization for the right child
  res2 = fibonacci(N-2, rt);

  // use corun to avoid blocking the worker from waiting the two children tasks 
  // to finish
  rt.corun();

  return res1 + res2;
}

int main() {

  tf::Executor executor;
  
  size_t N = 5, res;
  executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); });
  executor.wait_for_all();

  std::cout << N << "-th Fibonacci number is " << res << '\n';

  return 0;
}

The figure below shows the execution diagram, where the suffix *_1 represent the left child spawned by its parent runtime.

Fibonacci F4 fibonacci(4) [rt] F3_1 fibonacci(3) [rt1] F4->F3_1 F2_2 fibonacci(2) [rt] F4->F2_2 F2_1 fibonacci(2) [rt1_1] F3_1->F2_1 F1_2 fibonacci(1) [rt1] F3_1->F1_2 F1_1 fibonacci(1) [rt1_1_1] F2_1->F1_1 F0_1 fibonacci(0) [rt1_1] F2_1->F0_1 F1_3 fibonacci(1) [rt1] F2_2->F1_3 F0_2 fibonacci(0) [rt] F2_2->F0_2