Loading...
Searching...
No Matches
Parallel Find

Taskflow provides template functions for constructing tasks to perform parallel find over a range of items.

Include the Header

You need to include the header file, taskflow/algorithm/find.hpp, for using parallel-find algorithms.

#include <taskflow/algorithm/find.hpp>

What is a Find Algorithm?

A find algorithm searches a range [first, last) for an element that satisfies a given criterion and returns an iterator to the first such element, or last if no such element exists.

What distinguishes a parallel find from a parallel for-each is its short-circuit semantics: as soon as any worker finds a qualifying element, the remaining workers abandon their portions of the range. This means the algorithm does not always process every element — it stops as early as possible once a result is found.

Taskflow provides the following parallel-find algorithms:

All four algorithms store the result iterator in a variable passed by reference, so the result is available after the task completes.

Create a Parallel Find-If Task

tf::Taskflow::find_if(B first, E last, T& result, UOP predicate, P part) performs a parallel search over the range [first, last) and stores in result an iterator to the first element for which predicate returns true, or last if no such element exists. It resembles a parallel implementation of the following loop:

template <typename InputIt, typename UnaryPredicate>
InputIt find_if(InputIt first, InputIt last, UnaryPredicate predicate) {
for (; first != last; ++first) {
if (predicate(*first)) {
return first;
}
}
return last;
}

The example below creates a task to find the element equal to 22 in a range of 10 integers:

std::vector<int> input = {1, 9, 22, 3, -6, 13, 12, 0, 9, 11};
std::vector<int>::iterator result;
taskflow.find_if(
input.begin(), input.end(), result, [](int i) { return i == 22; }
);
executor.run(taskflow).wait();
assert(*result == 22);

Capture Iterators by Reference

You can pass iterators by reference using std::ref to marshal parameter updates between dependent tasks. This is useful when the range is not known at task-graph construction time but is initialized by an upstream task.

std::vector<int> input;
std::vector<int>::iterator result, first, last;
tf::Task init = taskflow.emplace([&]() {
input = {1, 9, 22, 3, -6, 13, 12, 0, 9, 11};
first = input.begin();
last = input.end();
});
tf::Task task = taskflow.find_if(
std::ref(first), std::ref(last), result, [](int i) { return i == 22; }
);
// wrong! first and last are captured by copy at construction time
// tf::Task task = taskflow.find_if(first, last, result, [](int i){ return i == 22; });
init.precede(task);
executor.run(taskflow).wait();
assert(*result == 22);
class to create a task handle over a taskflow node
Definition task.hpp:263
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:952

When init finishes, first and last point to the initialized data range of input, and the find-if task searches over all 10 elements.

Create a Parallel Find-If-Not Task

tf::Taskflow::find_if_not(B first, E last, T& result, UOP predicate, P part) performs a parallel search over the range [first, last) and stores in result an iterator to the first element for which predicate returns false, or last if no such element exists. It resembles a parallel implementation of the following loop:

template <typename InputIt, typename UnaryPredicate>
InputIt find_if_not(InputIt first, InputIt last, UnaryPredicate predicate) {
for (; first != last; ++first) {
if (!predicate(*first)) {
return first;
}
}
return last;
}

The example below creates a task to find the first element that is not equal to 1 in a range where only one element differs:

std::vector<int> input = {1, 1, 22, 1, 1, 1, 1, 1, 1, 1};
std::vector<int>::iterator result;
taskflow.find_if_not(
input.begin(), input.end(), result, [](int i) { return i == 1; }
);
executor.run(taskflow).wait();
assert(*result == 22);

Capture Iterators by Reference

As with tf::Taskflow::find_if, iterators can be passed by reference using std::ref so that an upstream task can set up the range before the parallel find-if-not runs.

std::vector<int> input;
std::vector<int>::iterator result, first, last;
tf::Task init = taskflow.emplace([&]() {
input = {1, 1, 22, 1, 1, 1, 1, 1, 1, 1};
first = input.begin();
last = input.end();
});
tf::Task task = taskflow.find_if_not(
std::ref(first), std::ref(last), result, [](int i) { return i == 1; }
);
// wrong! first and last are captured by copy at construction time
// tf::Task task = taskflow.find_if_not(first, last, result, [](int i){ return i == 1; });
init.precede(task);
executor.run(taskflow).wait();
assert(*result == 22);

Find the Minimum Element

tf::Taskflow::min_element(B first, E last, T& result, C comp, P part) performs a parallel search over the range [first, last) and stores in result an iterator to the element with the smallest value under the comparator comp. It resembles a parallel implementation of the following loop:

template <typename InputIt, typename Compare>
InputIt min_element(InputIt first, InputIt last, Compare comp) {
if (first == last) return last;
InputIt smallest = first;
for (++first; first != last; ++first) {
if (comp(*first, *smallest)) {
smallest = first;
}
}
return smallest;
}

The example below finds the smallest element in a range of 10 integers:

std::vector<int> input = {1, 1, 1, 1, 1, -1, 1, 1, 1, 1};
std::vector<int>::iterator result;
taskflow.min_element(
input.begin(), input.end(), std::less<int>(), result
);
executor.run(taskflow).wait();
assert(*result == -1);

Capture Iterators by Reference

You can pass iterators by reference using std::ref so that an upstream task can set up the range before the parallel min-element search runs.

std::vector<int> input;
std::vector<int>::iterator result, first, last;
tf::Task init = taskflow.emplace([&]() {
input = {1, 1, 1, 1, 1, -1, 1, 1, 1, 1};
first = input.begin();
last = input.end();
});
tf::Task task = taskflow.min_element(
std::ref(first), std::ref(last), std::less<int>(), result
);
// wrong! first and last are captured by copy at construction time
// tf::Task task = taskflow.min_element(first, last, std::less<int>(), result);
init.precede(task);
executor.run(taskflow).wait();
assert(*result == -1);

Find the Maximum Element

tf::Taskflow::max_element(B first, E last, T& result, C comp, P part) performs a parallel search over the range [first, last) and stores in result an iterator to the element with the largest value under the comparator comp. It resembles a parallel implementation of the following loop:

template <typename InputIt, typename Compare>
InputIt max_element(InputIt first, InputIt last, Compare comp) {
if (first == last) return last;
InputIt largest = first;
for (++first; first != last; ++first) {
if (comp(*largest, *first)) {
largest = first;
}
}
return largest;
}

The example below finds the largest element in a range of 10 integers:

std::vector<int> input = {1, 1, 1, 1, 1, 2, 1, 1, 1, 1};
std::vector<int>::iterator result;
taskflow.max_element(
input.begin(), input.end(), std::less<int>(), result
);
executor.run(taskflow).wait();
assert(*result == 2);
Note
tf::Taskflow::max_element uses std::less as its comparator, which may seem counterintuitive. The comparator comp defines an ordering such that comp(a, b) returns true if a comes before b. For finding the maximum, we want the element that nothing else comes after, so std::less naturally puts the largest element last — and max_element returns it. Refer to std::max_element for further details.

Capture Iterators by Reference

You can pass iterators by reference using std::ref so that an upstream task can set up the range before the parallel max-element search runs.

std::vector<int> input;
std::vector<int>::iterator result, first, last;
tf::Task init = taskflow.emplace([&]() {
input = {1, 1, 1, 1, 1, 2, 1, 1, 1, 1};
first = input.begin();
last = input.end();
});
tf::Task task = taskflow.max_element(
std::ref(first), std::ref(last), std::less<int>(), result
);
// wrong! first and last are captured by copy at construction time
// tf::Task task = taskflow.max_element(first, last, std::less<int>(), result);
init.precede(task);
executor.run(taskflow).wait();
assert(*result == 2);

Configure a Partitioner

A partitioner controls how the iteration space is divided among workers. Taskflow provides four partitioners, each suited to different workload characteristics:

  • tf::StaticPartitioner divides the range into equal-sized chunks ahead of execution and assigns them to workers in order. It has the lowest scheduling overhead and works best when the predicate or comparator costs the same for every element.
  • tf::DynamicPartitioner distributes fixed-sized chunks to workers on demand as they become available. It adapts well to workloads where element evaluation cost varies, at the expense of slightly higher coordination overhead.
  • tf::GuidedPartitioner distributes chunks whose size decreases adaptively as work is consumed — large chunks early to reduce overhead, smaller chunks late to balance the tail. This is the default partitioner and delivers stable, near-optimal performance across a wide range of workloads.
  • tf::RandomPartitioner distributes chunks of randomly sampled sizes, which can help avoid systematic load imbalances caused by data-dependent cost patterns.

The following example creates two parallel find-if tasks using different partitioners:

std::vector<int> input(1024, -1);
std::vector<int>::iterator result;
tf::StaticPartitioner static_partitioner(0); // chunk size auto-determined
tf::GuidedPartitioner guided_partitioner(0); // minimum chunk size auto-determined
// parallel find-if with static partitioner
taskflow.find_if(
input.begin(), input.end(), result,
[](int i) { return i == -1; },
static_partitioner
);
// parallel find-if with guided partitioner
taskflow.find_if(
input.begin(), input.end(), result,
[](int i) { return i == -1; },
guided_partitioner
);
class to create a guided partitioner for scheduling parallel algorithms
Definition partitioner.hpp:402
class to construct a static partitioner for scheduling parallel algorithms
Definition partitioner.hpp:262

As a rule of thumb, prefer tf::StaticPartitioner when the predicate or comparator costs the same for every element and tf::GuidedPartitioner for irregular workloads where evaluation cost varies across elements. tf::DynamicPartitioner is a good choice when chunks must be kept small and strictly equal in size.

Note
By default, parallel-find tasks use tf::DefaultPartitioner (currently tf::GuidedPartitioner) if no partitioner is specified.