cudaFlow Algorithms » Parallel Transforms

tf::cudaFlow provides template methods for transforming ranges of items to different outputs.

Include the Header

You need to include the header file, taskflow/cuda/algorithm/transform.hpp, for creating a parallel-transform task.

#include <taskflow/cuda/algorithm/transform.hpp>

Transform a Range of Items

Iterator-based parallel-transform applies the given transform function to a range of items and store the result in another range specified by two iterators, first and last. The task created by tf::cudaFlow::transform(I first, I last, O output, C op) represents a parallel execution for the following loop:

while (first != last) {
  *output++ = op(*first++);
}

The following example creates a transform kernel that transforms an input range of N items to an output range by multiplying each item by 10.

// output[i] = input[i] * 10
cudaflow.transform(
  input, input + N, output, [] __device__ (int x) { return x * 10; }
); 

Each iteration is independent of each other and is assigned one kernel thread to run the callable. Since the callable runs on GPU, it must be declared with a __device__ specifier.

Transform Two Ranges of Items

You can transform two ranges of items to an output range through a binary operator. The task created by tf::cudaFlow::transform(I1 first1, I1 last1, I2 first2, O output, C op) represents a parallel execution for the following loop:

while (first1 != last1) {
  *output++ = op(*first1++, *first2++);
}

The following example creates a transform kernel that transforms two input ranges of N items to an output range by summing each pair of items in the input ranges.

// output[i] = input1[i] + inpu2[i]
cudaflow.transform(
  input1, input1+N, input2, output, []__device__(int a, int b) { return a+b; }
); 

Miscellaneous Items

The parallel-transform algorithms are also available in tf::cudaFlowCapturer.