class to create a CUDA stream with unique ownership More...
#include <taskflow/cuda/cuda_stream.hpp>
Public Types | |
| using | base_type = std::unique_ptr<std::remove_pointer_t<cudaStream_t>, Deleter> |
| base type for the underlying unique pointer | |
Public Member Functions | |
| template<typename... ArgsT> | |
| cudaStreamBase (ArgsT &&... args) | |
constructs a cudaStream object by passing the given arguments to the stream creator | |
| cudaStreamBase (cudaStreamBase &&)=default | |
constructs a cudaStream from the given rhs using move semantics | |
| cudaStreamBase & | operator= (cudaStreamBase &&)=default |
assign the rhs to *this using move semantics | |
| cudaStreamBase & | synchronize () |
| synchronizes the associated stream | |
| void | begin_capture (cudaStreamCaptureMode m=cudaStreamCaptureModeGlobal) const |
| begins graph capturing on the stream | |
| cudaGraph_t | end_capture () const |
| ends graph capturing on the stream | |
| void | record (cudaEvent_t event) const |
| records an event on the stream | |
| void | wait (cudaEvent_t event) const |
| waits on an event | |
| template<typename C, typename D> | |
| cudaStreamBase & | run (const cudaGraphExecBase< C, D > &exec) |
| runs the given executable CUDA graph | |
| cudaStreamBase & | run (cudaGraphExec_t exec) |
| runs the given executable CUDA graph | |
class to create a CUDA stream with unique ownership
| Creator | functor to create the stream (used in constructor) |
| Deleter | functor to delete the stream (used in destructor) |
The cudaStream class encapsulates a cudaStream_t using std::unique_ptr, ensuring that CUDA events are properly created and destroyed with a unique ownership.
| using tf::cudaStreamBase< Creator, Deleter >::base_type = std::unique_ptr<std::remove_pointer_t<cudaStream_t>, Deleter> |
base type for the underlying unique pointer
This alias provides a shorthand for the underlying std::unique_ptr type that manages CUDA stream resources with an associated deleter.
|
inlineexplicit |
constructs a cudaStream object by passing the given arguments to the stream creator
Constructs a cudaStream object by passing the given arguments to the stream creator
| args | arguments to pass to the stream creator |
|
inline |
begins graph capturing on the stream
When a stream is in capture mode, all operations pushed into the stream will not be executed, but will instead be captured into a graph, which will be returned via cudaStream::end_capture.
A thread's mode can be one of the following:
cudaStreamCaptureModeGlobal: This is the default mode. If the local thread has an ongoing capture sequence that was not initiated with cudaStreamCaptureModeRelaxed at cuStreamBeginCapture, or if any other thread has a concurrent capture sequence initiated with cudaStreamCaptureModeGlobal, this thread is prohibited from potentially unsafe API calls.cudaStreamCaptureModeThreadLocal: If the local thread has an ongoing capture sequence not initiated with cudaStreamCaptureModeRelaxed, it is prohibited from potentially unsafe API calls. Concurrent capture sequences in other threads are ignored.cudaStreamCaptureModeRelaxed: The local thread is not prohibited from potentially unsafe API calls. Note that the thread is still prohibited from API calls which necessarily conflict with stream capture, for example, attempting cudaEventQuery on an event that was last recorded inside a capture sequence.
|
inline |
ends graph capturing on the stream
Equivalently calling cudaStreamEndCapture to end capture on stream and returning the captured graph. Capture must have been initiated on stream via a call to cudaStream::begin_capture. If capture was invalidated, due to a violation of the rules of stream capture, then a NULL graph will be returned.
|
inline |
records an event on the stream
Equivalently calling cudaEventRecord to record an event on this stream, both of which must be on the same CUDA context.
| cudaStreamBase & tf::cudaStreamBase< Creator, Deleter >::run | ( | const cudaGraphExecBase< C, D > & | exec | ) |
runs the given executable CUDA graph
| exec | the given cudaGraphExec |
| cudaStreamBase< SC, SD > & tf::cudaStreamBase< SC, SD >::run | ( | cudaGraphExec_t | exec | ) |
runs the given executable CUDA graph
| exec | the given cudaGraphExec_t |
|
inline |
synchronizes the associated stream
Equivalently calling cudaStreamSynchronize to block until this stream has completed all operations.
|
inline |
waits on an event
Equivalently calling cudaStreamWaitEvent to make all future work submitted to stream wait for all work captured in event.