Taskflow Algorithms » Parallel Iterations

Taskflow provides template functions for constructing tasks to perform parallel iterations over ranges of items.

Include the Header

You need to include the header file, taskflow/algorithm/for_each.hpp, for using parallel-iteration algorithms.

#include <taskflow/algorithm/for_each.hpp>

Create an Index-based Parallel-Iteration Task

Index-based parallel-for performs parallel iterations over a range [first, last) with the given step size. The task created by tf::Taskflow::for_each_index(B first, E last, S step, C callable, P part) represents parallel execution of the following loop:

// positive step
for(auto i=first; i<last; i+=step) {
  callable(i);
}

// negative step
for(auto i=first; i>last; i+=step) {
  callable(i);
}

We support only integer-based range. The range can go positive or negative direction.

taskflow.for_each_index(0, 100,  2, [](int i) { });  // 50 loops with a + step
taskflow.for_each_index(100, 0, -2, [](int i) { });  // 50 loops with a - step

Notice that either positive or negative direction is defined in terms of the range, [first, last), where end is excluded. In the positive case, the 50 items are 0, 2, 4, 6, 8, ..., 96, 98. In the negative case, the 50 items are 100, 98, 96, 04, ... 4, 2. An example of the Taskflow graph for the positive case under 12 workers is depicted below:

Taskflow p0xce2720 [0, 100) with the step size of 2 p0x7f322c000b50 pfg_0 p0x7f322c000b50->p0xce2720 p0x7f322c000c58 pfg_1 p0x7f322c000c58->p0xce2720 p0x7f322c000d60 pfg_2 p0x7f322c000d60->p0xce2720 p0x7f322c000e68 pfg_3 p0x7f322c000e68->p0xce2720 p0x7f322c000f70 pfg_4 p0x7f322c000f70->p0xce2720 p0x7f322c001078 pfg_5 p0x7f322c001078->p0xce2720 p0x7f322c001180 pfg_6 p0x7f322c001180->p0xce2720 p0x7f322c001288 pfg_7 p0x7f322c001288->p0xce2720 p0x7f322c001390 pfg_8 p0x7f322c001390->p0xce2720 p0x7f322c001498 pfg_9 p0x7f322c001498->p0xce2720 p0x7f322c0015a0 pfg_10 p0x7f322c0015a0->p0xce2720 p0x7f322c0016a8 pfg_11 p0x7f322c0016a8->p0xce2720

Instead of explicitly specifying the index range and the callable for each index invocation, the overload tf::Taskflow::for_each_index(R range, C callable, P part) provides you with a more flexible way to iterate over subranges of indices. This overload uses tf::IndexRange to partition the range into subranges, allowing finer control over how each subrange is processed. For instance, the code below does the same thing using two different approaches:

std::vector<int> data1(100), data2(100);

// Approach 1: initialize data1 using explicit index range
taskflow.for_each_index(0, 100, 1, [&](int i){ data1[i] = 10; });

// Approach 2: initialize data2 using tf::IndexRange
tf::IndexRange<int> range(0, 100, 1);
taskflow.for_each_index(range, [&](tf::IndexRange<int> subrange){
  for(int i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) {
    data2[i] = 10;
  }
});

Both approaches produce the same result, but the second approach offers more flexibility in terms of how each partitioned subrange is iterated. This is particularly useful for applications that benefit from SIMD optimizations or other range-based processing strategies.

Capture Indices by Reference

You can pass indices by reference using std::ref to marshal parameter update between dependent tasks. This is especially useful when the range indices are unknown at the time of creating a for-each-index task, but is initialized from another task.

int* vec;
int first, last;

auto init = taskflow.emplace([&](){
  first = 0;
  last  = 1000;
  vec = new int[1000];  
});

auto pf = taskflow.for_each_index(std::ref(first), std::ref(last), 1, 
  [&] (int i) {
    std::cout << "parallel iteration on index " << vec[i] << '\n';
  }
);

// wrong! must use std::ref, or first and last are captured by copy
// auto pf = taskflow.for_each_index(first, last, 1, [&](int i) {
//   std::cout << "parallel iteration on index " << vec[i] << '\n';
// });

init.precede(pf);

When init finishes, the parallel-for task pf will see first as 0 and last as 1000 and performs parallel iterations over the 1000 items.

Create an Iterator-based Parallel-Iteration Task

Iterator-based parallel-for performs parallel iterations over a range specified by two STL-styled iterators, first and last. The task created by tf::Taskflow::for_each(B first, E last, C callable, P part) represents a parallel execution of the following loop:

for(auto i=first; i<last; i++) {
  callable(*i);
}

tf::Taskflow::for_each(B first, E last, C callable, P&& part) simultaneously applies the callable to the object obtained by dereferencing every iterator in the range [first, last). It is user's responsibility for ensuring the range is valid within the execution of the parallel-for task. Iterators must have the post-increment operator ++ defined.

std::vector<int> vec = {1, 2, 3, 4, 5};
taskflow.for_each(vec.begin(), vec.end(), [](int i){ 
  std::cout << "parallel for on item " << i << '\n';  
});

std::list<std::string> list = {"hi", "from", "t", "a", "s", "k", "f", "low"};
taskflow.for_each(list.begin(), list.end(), [](const std::string& str){ 
  std::cout << "parallel for on item " << str << '\n';  
});

Capture Iterators by Reference

Similar to tf::Taskflow::for_each_index, iterators of tf::Taskflow::for_each are templated to allow capturing range parameters by reference, such that one task can set up the range before another task performs the parallel-for algorithm. For example:

std::vector<int> vec;
std::vector<int>::iterator first, last;;

tf::Task init = taskflow.emplace([&](){
  vec.resize(1000);
  first = vec.begin();
  last  = vec.end();
});

tf::Task pf = taskflow.for_each(std::ref(first), std::ref(last), [&](int i) {
  std::cout << "parallel iteration on item " << i << '\n';
});

// wrong! must use std::ref, or first and last are captured by copy
// tf::Task pf = taskflow.for_each(first, last, [&](int i) {
//   std::cout << "parallel iteration on item " << i << '\n';
// });

init.precede(pf);

When init finishes, the parallel-for task pf will see first pointing to the beginning of vec and last pointing to the end of vec and performs parallel iterations over the 1000 items. The two tasks form an end-to-end task graph where the parameters of parallel-for are computed on the fly.

Configure a Partitioner

You can configure a partitioner for parallel-iteration tasks to run with different scheduling methods, such as guided partitioning, dynamic partitioning, and static partitioning. The following example creates two parallel-iteration tasks using two different partitioners, one with the static partitioning algorithm and another one with the guided partitioning algorithm:

std::vector<int> vec(1024, 0);

// create two partitioners with a chunk size of 10
tf::StaticPartitioner static_partitioner(10);
tf::GuidedPartitioner guided_partitioner(10);

// create a parallel-iteration task with static partitioner
taskflow.for_each(
  vec.begin(), vec.end(), [&](int i) {
    std::cout << "parallel iteration on item " << i << '\n';
  },
  static_partitioner
);

// create a parallel-iteration task with guided partitioner
taskflow.for_each(
  vec.begin(), vec.end(), [&](int i) {
    std::cout << "parallel iteration on item " << i << '\n';
  },
  guided_partitioner
);