template<unsigned NT, unsigned VT>
cudaExecutionPolicy class
class to define execution policy for CUDA standard algorithms
Template parameters | |
---|---|
NT | number of threads per block |
VT | number of work units per thread |
Execution policy configures the kernel execution parameters in CUDA algorithms. The first template argument, NT
, the number of threads per block should always be a power-of-two number. The second template argument, VT
, the number of work units per thread is recommended to be an odd number to avoid bank conflict.
Details can be referred to Execution Policy.
Public static variables
Public static functions
- static auto num_blocks(unsigned N) -> unsigned
- queries the number of blocks to accommodate N elements
-
template<typename T>static auto reduce_bufsz(unsigned count) -> unsigned
- queries the buffer size in bytes needed to call reduce kernels
-
template<typename T>static auto min_element_bufsz(unsigned count) -> unsigned
- queries the buffer size in bytes needed to call tf::
cuda_min_element -
template<typename T>static auto max_element_bufsz(unsigned count) -> unsigned
- queries the buffer size in bytes needed to call tf::
cuda_max_element -
template<typename T>static auto scan_bufsz(unsigned count) -> unsigned
- queries the buffer size in bytes needed to call scan kernels
- static auto merge_bufsz(unsigned a_count, unsigned b_count) -> unsigned
- queries the buffer size in bytes needed for CUDA merge algorithms
Constructors, destructors, conversion operators
- cudaExecutionPolicy() defaulted
- constructs an execution policy object with default stream
- cudaExecutionPolicy(cudaStream_t s) explicit
- constructs an execution policy object with the given stream
Public functions
Function documentation
template<unsigned NT, unsigned VT>
template<typename T>
static unsigned tf:: cudaExecutionPolicy<NT, VT>:: reduce_bufsz(unsigned count)
queries the buffer size in bytes needed to call reduce kernels
Template parameters | |
---|---|
T | value type |
Parameters | |
count | number of elements to reduce |
The function is used to allocate a buffer for calling tf::
template<unsigned NT, unsigned VT>
template<typename T>
static unsigned tf:: cudaExecutionPolicy<NT, VT>:: min_element_bufsz(unsigned count)
queries the buffer size in bytes needed to call tf::
Template parameters | |
---|---|
T | value type |
Parameters | |
count | number of elements to search |
The function is used to decide the buffer size in bytes for calling tf::
template<unsigned NT, unsigned VT>
template<typename T>
static unsigned tf:: cudaExecutionPolicy<NT, VT>:: max_element_bufsz(unsigned count)
queries the buffer size in bytes needed to call tf::
Template parameters | |
---|---|
T | value type |
Parameters | |
count | number of elements to search |
The function is used to decide the buffer size in bytes for calling tf::
template<unsigned NT, unsigned VT>
template<typename T>
static unsigned tf:: cudaExecutionPolicy<NT, VT>:: scan_bufsz(unsigned count)
queries the buffer size in bytes needed to call scan kernels
Template parameters | |
---|---|
T | value type |
Parameters | |
count | number of elements to scan |
The function is used to allocate a buffer for calling tf::
template<unsigned NT, unsigned VT>
static unsigned tf:: cudaExecutionPolicy<NT, VT>:: merge_bufsz(unsigned a_count,
unsigned b_count)
queries the buffer size in bytes needed for CUDA merge algorithms
Parameters | |
---|---|
a_count | number of elements in the first vector to merge |
b_count | number of elements in the second vector to merge |
The buffer size of merge algorithm does not depend on the data type. The buffer is purely used only for storing temporary indices (of type unsigned
) required during the merge process.
The function is used to allocate a buffer for calling tf::