Parallel Transforms
tf::
Include the Header
You need to include the header file, taskflow/cuda/algorithm/transform.hpp
, for creating a parallel-transform task.
#include <taskflow/cuda/algorithm/transform.hpp>
Transform a Range of Items
Iterator-based parallel-transform applies the given transform function to a range of items and store the result in another range specified by two iterators, first
and last
. The task created by tf::
while (first != last) { *output++ = op(*first++); }
The following example creates a transform kernel that transforms an input range of N
items to an output range by multiplying each item by 10.
// output[i] = input[i] * 10 cudaflow.transform( input, input + N, output, [] __device__ (int x) { return x * 10; } );
Each iteration is independent of each other and is assigned one kernel thread to run the callable. Since the callable runs on GPU, it must be declared with a __device__
specifier.
Transform Two Ranges of Items
You can transform two ranges of items to an output range through a binary operator. The task created by tf::
while (first1 != last1) { *output++ = op(*first1++, *first2++); }
The following example creates a transform kernel that transforms two input ranges of N
items to an output range by summing each pair of items in the input ranges.
// output[i] = input1[i] + inpu2[i] cudaflow.transform( input1, input1+N, input2, output, []__device__(int a, int b) { return a+b; } );
Miscellaneous Items
The parallel-transform algorithms are also available in tf::