Parallel Iterations
Taskflow provides template functions for constructing tasks to perform parallel iterations over ranges of items.
Include the Header
You need to include the header file, taskflow/algorithm/for_each.hpp
, for using parallel-iteration algorithms.
#include <taskflow/algorithm/for_each.hpp>
Create an Index-based Parallel-Iteration Task
Index-based parallel-for performs parallel iterations over a range [first, last)
with the given step
size. The task created by tf::
// positive step for(auto i=first; i<last; i+=step) { callable(i); } // negative step for(auto i=first; i>last; i+=step) { callable(i); }
We support only integer-based range. The range can go positive or negative direction.
taskflow.for_each_index(0, 100, 2, [](int i) { }); // 50 loops with a + step taskflow.for_each_index(100, 0, -2, [](int i) { }); // 50 loops with a - step
Notice that either positive or negative direction is defined in terms of the range, [first, last)
, where end
is excluded. In the positive case, the 50 items are 0, 2, 4, 6, 8, ..., 96, 98. In the negative case, the 50 items are 100, 98, 96, 04, ... 4, 2. An example of the Taskflow graph for the positive case under 12 workers is depicted below:
Instead of explicitly specifying the index range and the callable for each index invocation, the overload tf::
std::vector<int> data1(100), data2(100); // Approach 1: initialize data1 using explicit index range taskflow.for_each_index(0, 100, 1, [&](int i){ data1[i] = 10; }); // Approach 2: initialize data2 using tf::IndexRange tf::IndexRange<int> range(0, 100, 1); taskflow.for_each_index(range, [&](tf::IndexRange<int> subrange){ for(int i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) { data2[i] = 10; } });
Both approaches produce the same result, but the second approach offers more flexibility in terms of how each partitioned subrange is iterated. This is particularly useful for applications that benefit from SIMD optimizations or other range-based processing strategies.
Capture Indices by Reference
You can pass indices by reference using std::
int* vec; int first, last; auto init = taskflow.emplace([&](){ first = 0; last = 1000; vec = new int[1000]; }); auto pf = taskflow.for_each_index(std::ref(first), std::ref(last), 1, [&] (int i) { std::cout << "parallel iteration on index " << vec[i] << '\n'; } ); // wrong! must use std::ref, or first and last are captured by copy // auto pf = taskflow.for_each_index(first, last, 1, [&](int i) { // std::cout << "parallel iteration on index " << vec[i] << '\n'; // }); init.precede(pf);
When init
finishes, the parallel-for task pf
will see first
as 0 and last
as 1000 and performs parallel iterations over the 1000 items.
Create an Iterator-based Parallel-Iteration Task
Iterator-based parallel-for performs parallel iterations over a range specified by two STL-styled iterators, first
and last
. The task created by tf::
for(auto i=first; i<last; i++) { callable(*i); }
tf::Taskflow::for_each(B first, E last, C callable, P&& part) simultaneously applies the callable to the object obtained by dereferencing every iterator in the range [first, last)
. It is user's responsibility for ensuring the range is valid within the execution of the parallel-for task. Iterators must have the post-increment operator ++ defined.
std::vector<int> vec = {1, 2, 3, 4, 5}; taskflow.for_each(vec.begin(), vec.end(), [](int i){ std::cout << "parallel for on item " << i << '\n'; }); std::list<std::string> list = {"hi", "from", "t", "a", "s", "k", "f", "low"}; taskflow.for_each(list.begin(), list.end(), [](const std::string& str){ std::cout << "parallel for on item " << str << '\n'; });
Capture Iterators by Reference
Similar to tf::
std::vector<int> vec; std::vector<int>::iterator first, last;; tf::Task init = taskflow.emplace([&](){ vec.resize(1000); first = vec.begin(); last = vec.end(); }); tf::Task pf = taskflow.for_each(std::ref(first), std::ref(last), [&](int i) { std::cout << "parallel iteration on item " << i << '\n'; }); // wrong! must use std::ref, or first and last are captured by copy // tf::Task pf = taskflow.for_each(first, last, [&](int i) { // std::cout << "parallel iteration on item " << i << '\n'; // }); init.precede(pf);
When init
finishes, the parallel-for task pf
will see first
pointing to the beginning of vec
and last
pointing to the end of vec
and performs parallel iterations over the 1000 items. The two tasks form an end-to-end task graph where the parameters of parallel-for are computed on the fly.
Configure a Partitioner
You can configure a partitioner for parallel-iteration tasks to run with different scheduling methods, such as guided partitioning, dynamic partitioning, and static partitioning. The following example creates two parallel-iteration tasks using two different partitioners, one with the static partitioning algorithm and another one with the guided partitioning algorithm:
std::vector<int> vec(1024, 0); // create two partitioners with a chunk size of 10 tf::StaticPartitioner static_partitioner(10); tf::GuidedPartitioner guided_partitioner(10); // create a parallel-iteration task with static partitioner taskflow.for_each( vec.begin(), vec.end(), [&](int i) { std::cout << "parallel iteration on item " << i << '\n'; }, static_partitioner ); // create a parallel-iteration task with guided partitioner taskflow.for_each( vec.begin(), vec.end(), [&](int i) { std::cout << "parallel iteration on item " << i << '\n'; }, guided_partitioner );