Loading...
Searching...
No Matches
Parallel Scan

Taskflow provides template functions for constructing tasks to perform parallel scan over a range of items.

Include the Header

You need to include the header file, taskflow/algorithm/scan.hpp, for creating a parallel-scan task.

#include <taskflow/algorithm/scan.hpp>

What is a Scan Operation?

A scan (also known as prefix sum) computes, for each position k in the output range, the cumulative result of applying a binary operator to all relevant elements in the input range. There are two variants:

  • An inclusive scan includes the element at position k itself. The N-th output element is the combination of the first N input elements. It is equivalent to the following sequential loop:
for(size_t k = 0; k < n; k++) {
output[k] = (k == 0) ? input[0] : bop(output[k-1], input[k]);
}
  • An exclusive scan excludes the element at position k. The N-th output element is the combination of init with the first N-1 input elements, so the N-th input element is excluded. It is equivalent to the following sequential loop:
for(size_t k = 0; k < n; k++) {
output[k] = (k == 0) ? init : bop(output[k-1], input[k-1]);
}

The following diagram illustrates an inclusive scan on the input [1, 2, 3, 4, -2, 1, 8] using addition:

Each output cell accumulates all input elements up to and including its own position (solid green arrow).

The following diagram illustrates an exclusive scan on the same input with init=0:

Each output cell accumulates all input elements before its position (dashed blue arrow), so the first output cell receives init directly and the result is shifted one position to the right compared to inclusive scan.

Create a Parallel Inclusive Scan Task

tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop) performs an inclusive scan over the range [first, last) using the binary operator bop and writes results to the range beginning at d_first.

std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.inclusive_scan(
input.begin(), input.end(), output.begin(), std::plus<int>{}
);
executor.run(taskflow).wait();
// output is {1, 3, 6, 10, 15}

The output range may be the same as the input range, in which case the scan is performed in-place:

std::vector<int> input = {1, 2, 3, 4, 5};
taskflow.inclusive_scan(
input.begin(), input.end(), input.begin(), std::plus<int>{}
);
executor.run(taskflow).wait();
// input is now {1, 3, 6, 10, 15}

The overload tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop, T init) performs an inclusive scan with an additional initial value init that is combined with the first element before any other accumulation:

std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.inclusive_scan(
input.begin(), input.end(), output.begin(), std::plus<int>{}, -1
);
executor.run(taskflow).wait();
// output is {0, 2, 5, 9, 14}

Capture Iterators by Reference

You can pass iterators by reference using std::ref to marshal parameter updates between dependent tasks. This is useful when the range is not known at task-graph construction time but is initialized by an upstream task.

std::vector<int> input, output;
std::vector<int>::iterator first, last, d_first;
tf::Task init = taskflow.emplace([&]() {
input = {1, 2, 3, 4, 5};
output.resize(input.size());
first = input.begin();
last = input.end();
d_first = output.begin();
});
tf::Task scan = taskflow.inclusive_scan(
std::ref(first), std::ref(last), std::ref(d_first), std::plus<int>{}
);
// wrong! iterators are captured by copy at construction time
// tf::Task scan = taskflow.inclusive_scan(first, last, d_first, std::plus<int>{});
init.precede(scan);
executor.run(taskflow).wait();
// output is {1, 3, 6, 10, 15}
class to create a task handle over a taskflow node
Definition task.hpp:263
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:952

Create a Parallel Transform-Inclusive Scan Task

tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop) applies the unary operator uop to transform each input element before performing the inclusive scan with bop. It represents parallel execution of the following loop:

for(size_t k = 0; k < n; k++) {
output[k] = (k == 0) ? uop(input[0]) : bop(output[k-1], uop(input[k]));
}
std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.transform_inclusive_scan(
input.begin(), input.end(), output.begin(),
std::plus<int>{},
[](int item) { return -item; }
);
executor.run(taskflow).wait();
// output is {-1, -3, -6, -10, -15}

The overload tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop, T init) adds an initial value init. Note that init does not participate in the unary transformation uop — only input elements are transformed:

std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.transform_inclusive_scan(
input.begin(), input.end(), output.begin(),
std::plus<int>{},
[](int item) { return -item; },
-1
);
executor.run(taskflow).wait();
// output is {-2, -4, -7, -11, -16}

Create a Parallel Exclusive Scan Task

tf::Taskflow::exclusive_scan(B first, E last, D d_first, T init, BOP bop) performs an exclusive scan over the range [first, last) with the given initial value init using the binary operator bop.

std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.exclusive_scan(
input.begin(), input.end(), output.begin(), -1, std::plus<int>{}
);
executor.run(taskflow).wait();
// output is {-1, 0, 2, 5, 9}

The output range may be the same as the input range, in which case the scan is performed in-place:

std::vector<int> input = {1, 2, 3, 4, 5};
taskflow.exclusive_scan(
input.begin(), input.end(), input.begin(), -1, std::plus<int>{}
);
executor.run(taskflow).wait();
// input is now {-1, 0, 2, 5, 9}

Capture Iterators by Reference

As with inclusive scan, all iterators can be passed by reference using std::ref so that an upstream task can set up the ranges before the parallel exclusive scan runs.

std::vector<int> input, output;
std::vector<int>::iterator first, last, d_first;
tf::Task init = taskflow.emplace([&]() {
input = {1, 2, 3, 4, 5};
output.resize(input.size());
first = input.begin();
last = input.end();
d_first = output.begin();
});
tf::Task scan = taskflow.exclusive_scan(
std::ref(first), std::ref(last), std::ref(d_first), -1, std::plus<int>{}
);
// wrong! iterators are captured by copy at construction time
// tf::Task scan = taskflow.exclusive_scan(first, last, d_first, -1, std::plus<int>{});
init.precede(scan);
executor.run(taskflow).wait();
// output is {-1, 0, 2, 5, 9}

Create a Parallel Transform-Exclusive Scan Task

tf::Taskflow::transform_exclusive_scan(B first, E last, D d_first, T init, BOP bop, UOP uop) applies the unary operator uop to transform each input element before performing the exclusive scan with bop and the initial value init. It represents parallel execution of the following loop:

for(size_t k = 0; k < n; k++) {
output[k] = (k == 0) ? init : bop(output[k-1], uop(input[k-1]));
}

Note that init does not participate in the unary transformation uop — only input elements are transformed:

std::vector<int> input = {1, 2, 3, 4, 5};
std::vector<int> output(input.size());
taskflow.transform_exclusive_scan(
input.begin(), input.end(), output.begin(), -1,
std::plus<int>{},
[](int item) { return -item; }
);
executor.run(taskflow).wait();
// output is {-1, -2, -4, -7, -11}