Taskflow provides template functions for constructing tasks to perform parallel transforms over ranges of items.
Include the Header
You need to include the header file, taskflow/algorithm/transform.hpp, for creating a parallel-transform task.
#include <taskflow/algorithm/transform.hpp>
Create a Unary Parallel-Transform Task
A unary parallel-transform applies a callable to every element in a source range and writes the result to a destination range. The task created by tf::Taskflow::transform(B first1, E last1, O d_first, C c, P part) represents parallel execution of the following loop:
while (first1 != last1) {
*d_first++ = c(*first1++);
}
tf::Taskflow::transform simultaneously applies the callable c to the object obtained by dereferencing every iterator in the range [first1, last1) and stores the result in another range beginning at d_first. It is the user's responsibility to ensure the range is valid within the execution of the parallel-transform task.
std::vector<int> src = {1, 2, 3, 4, 5};
std::vector<int> tgt(src.size());
taskflow.transform(src.begin(), src.end(), tgt.begin(), [](int i) {
return i + 1;
});
Capture Iterators by Reference
You can pass iterators by reference using std::ref to marshal parameter updates between dependent tasks. This is useful when the range is not known at task-graph construction time but is initialized by an upstream task.
std::vector<int> src, tgt;
std::vector<int>::iterator first, last, d_first;
tf::Task init = taskflow.emplace([&]() {
src.resize(1000);
tgt.resize(1000);
first = src.begin();
last = src.end();
d_first = tgt.begin();
});
tf::Task transform = taskflow.transform(
std::ref(first), std::ref(last), std::ref(d_first),
[](int i) {
return i + 1;
}
);
class to create a task handle over a taskflow node
Definition task.hpp:263
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:952
When init finishes, the parallel-transform task transform will see first pointing to the beginning of src and last pointing to the end of src, and performs parallel transforms over the 1000 items storing results starting at d_first.
Create a Binary Parallel-Transform Task
A binary parallel-transform applies a callable to pairs of elements drawn from two source ranges and writes each result to a destination range. The overload tf::Taskflow::transform(B1 first1, E1 last1, B2 first2, O d_first, C c, P part) represents parallel execution of the following loop:
while (first1 != last1) {
*d_first++ = c(*first1++, *first2++);
}
The following example creates a parallel-transform task that adds two ranges element-wise and stores the result in a target range:
std::vector<int> src1 = {1, 2, 3, 4, 5};
std::vector<int> src2 = {5, 4, 3, 2, 1};
std::vector<int> tgt(src1.size());
taskflow.transform(
src1.begin(), src1.end(), src2.begin(), tgt.begin(),
[](int i, int j) {
return i + j;
}
);
Capture Iterators by Reference
As with the unary overload, all iterators can be passed by reference using std::ref so that an upstream task can set up the ranges before the parallel-transform runs.
std::vector<int> src1, src2, tgt;
std::vector<int>::iterator first1, last1, first2, d_first;
tf::Task init = taskflow.emplace([&]() {
src1.resize(1000);
src2.resize(1000);
tgt.resize(1000);
first1 = src1.begin();
last1 = src1.end();
first2 = src2.begin();
d_first = tgt.begin();
});
tf::Task transform = taskflow.transform(
std::ref(first1), std::ref(last1), std::ref(first2), std::ref(d_first),
[](int i, int j) {
return i + j;
}
);
When init finishes, the parallel-transform task transform will see all four iterators updated and performs parallel transforms over the 1000 item pairs, storing each result in tgt.
Configure a Partitioner
A partitioner controls how the iteration space is divided among workers. Taskflow provides four partitioners, each suited to different workload characteristics:
- tf::StaticPartitioner divides the range into equal-sized chunks ahead of execution and assigns them to workers in order. It has the lowest scheduling overhead and delivers the best performance when every element costs roughly the same amount of work to transform.
- tf::DynamicPartitioner distributes fixed-sized chunks to workers on demand as they become available. It adapts well to workloads where transform cost varies per element, at the expense of slightly higher coordination overhead.
- tf::GuidedPartitioner distributes chunks whose size decreases adaptively as work is consumed — large chunks early to reduce overhead, smaller chunks late to balance the tail. This is the default partitioner and delivers stable, near-optimal performance across a wide range of workloads.
- tf::RandomPartitioner distributes chunks of randomly sampled sizes, which can help avoid systematic load imbalances caused by data-dependent cost patterns.
The following example creates two parallel-transform tasks using different partitioners:
std::vector<int> src1 = {1, 2, 3, 4, 5};
std::vector<int> src2 = {5, 4, 3, 2, 1};
std::vector<int> tgt1(src1.size());
std::vector<int> tgt2(src1.size());
taskflow.transform(
src1.begin(), src1.end(), src2.begin(), tgt1.begin(),
[](int i, int j) { return i + j; },
static_partitioner
);
taskflow.transform(
src1.begin(), src1.end(), src2.begin(), tgt2.begin(),
[](int i, int j) { return i + j; },
guided_partitioner
);
class to create a guided partitioner for scheduling parallel algorithms
Definition partitioner.hpp:408
class to construct a static partitioner for scheduling parallel algorithms
Definition partitioner.hpp:262
As a rule of thumb, prefer tf::StaticPartitioner when every element costs the same to transform (e.g., element-wise arithmetic) and tf::GuidedPartitioner for irregular workloads (e.g., transforms whose cost depends on the element value). tf::DynamicPartitioner is a good choice when chunks must be kept small and strictly equal in size.
- Note
- By default, parallel-transform tasks use tf::DefaultPartitioner (currently tf::GuidedPartitioner) if no partitioner is specified.