Cookbook » Asynchronous Tasking

This chapters discusses how to launch tasks asynchronously so that you can incorporate independent, dynamic parallelism in your taskflows.

Launch Asynchronous Tasks from an Executor

Taskflow's executor provides an STL-style method, tf::Executor::async, that allows you to run a callable object asynchronously. This method returns a std::future which will eventually hold the result of the function call.

std::future<int> future = executor.async([](){ return 1; });
assert(future.get() == 1);

If you do not need the return value or do not require a std::future for synchronization, you should use tf::Executor::silent_async. This method returns nothing and incurs less overhead than tf::Executor::async, as it avoids the cost of managing a shared state for std::future.

executor.silent_async([](){
  // do some work without returning any result
});

Launching asynchronous tasks from an executor is thread-safe and can be invoked from multiple threads, including both worker threads inside the executor and external threads outside of it. The scheduler automatically detects the source of the submission and employs work-stealing to schedule the task efficiently, ensuring balanced workload distribution across workers.

tf::Task my_task = taskflow.emplace([&](){
  // launch an asynchronous task from my_task
  executor.async([&](){
    // launch another asynchronous task that may be run by another worker
    executor.async([&](){});
  })
});
executor.run(taskflow);
executor.wait_for_all();   // wait for all tasks to finish

Launch Asynchronous Tasks from a Runtime

You can launch asynchronous tasks from tf::Runtime using tf::Runtime::async or tf::Runtime::silent_async. The following code creates 100 asynchronous tasks from a runtime and joins their executions explicitly using tf::Runtime::corun.

tf::Taskflow taskflow;
tf::Executor executor;

std::atomic<int> counter{0};

taskflow.emplace([&] (tf::Runtime& rt){
  for(int i=0; i<100; i++) {
    rt.silent_async([&](){ ++counter; }));
  }
  rt.corun();  // all of the 100 asynchronous tasks will finish by this join
  assert(counter == 100);
});
executor.run(taskflow).wait();

Unlike tf::Subflow::join, you can call tf::Runtime::corun multiple times to synchronize the execution of asynchronous tasks between different runs. For example, the following code spawn 100 asynchronous tasks twice and join each execution to assure the spawned 100 asynchronous tasks have properly completed.

tf::Taskflow taskflow;
tf::Executor executor;

std::atomic<int> counter{0};

taskflow.emplace([&] (tf::Runtime& rt){
  // spawn 100 asynchronous tasks and join
  for(int i=0; i<100; i++) {
    rt.silent_async([&](){ ++counter; }));
  }
  rt.corun();  // all of the 100 asynchronous tasks will finish by this join
  assert(counter == 100);
  
  // spawn another 100 asynchronous tasks and join
  for(int i=0; i<100; i++) {
    rt.silent_async([&](){ ++counter; }));
  }
  rt.corun();  // all of the 100 asynchronous tasks will finish by this join
  assert(counter == 200);
});
executor.run(taskflow).wait();

By default, tf::Runtime does not join like tf::Subflow. All pending asynchronous tasks spawned from a tf::Runtime become uncontrollable once their parent runtime goes out of scope. It is user's responsibility to explicitly synchronize these tasks using tf::Runtime::corun.

Launch Asynchronous Tasks Recursively from a Runtime

Asynchronous tasks can take a reference to tf::Runtime, allowing them to recursively launch additional asynchronous tasks. Combined with tf::Runtime::corun, this enables the implementation of various recursive parallelism patterns, including parallel sort, divide-and-conquer algorithms, and the fork-join model. For instance, the example below demonstrates a parallel recursive implementation of Fibonacci numbers using recursive asynchronous tasking from tf::Runtime:

#include <taskflow/taskflow.hpp>

size_t fibonacci(size_t N, tf::Runtime& rt) {

  if(N < 2) return N; 

  size_t res1, res2;
  rt.silent_async([N, &res1](tf::Runtime& rt1){ res1 = fibonacci(N-1, rt1); });
  
  // tail optimization for the right child
  res2 = fibonacci(N-2, rt);

  // use corun to avoid blocking the worker from waiting the two children tasks 
  // to finish
  rt.corun();

  return res1 + res2;
}

int main() {

  tf::Executor executor;
  
  size_t N = 5, res;
  executor.silent_async([N, &res](tf::Runtime& rt){ res = fibonacci(N, rt); });
  executor.wait_for_all();

  std::cout << N << "-th Fibonacci number is " << res << '\n';

  return 0;
}

The figure below shows the execution diagram, where the suffix *_1 represent the left child spawned by its parent runtime.

Fibonacci F4 fibonacci(4) [rt] F3_1 fibonacci(3) [rt1] F4->F3_1 F2_2 fibonacci(2) [rt] F4->F2_2 F2_1 fibonacci(2) [rt1_1] F3_1->F2_1 F1_2 fibonacci(1) [rt1] F3_1->F1_2 F1_1 fibonacci(1) [rt1_1_1] F2_1->F1_1 F0_1 fibonacci(0) [rt1_1] F2_1->F0_1 F1_3 fibonacci(1) [rt1] F2_2->F1_3 F0_2 fibonacci(0) [rt] F2_2->F0_2