Taskflow 3.6.0 is the 7th release in the 3.x line! This release includes several new changes, such as dynamic task graph parallelism, improved parallel algorithms, modified GPU tasking interface, documentation, examples, and unit tests.
Download
Taskflow 3.6.0 can be downloaded from here.
System Requirements
To use Taskflow v3.6.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xcode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20
Taskflow works on Linux, Windows, and Mac OS X.
Release Summary
This release contains several changes to largely enhance the programmability of GPU tasking and standard parallel algorithms. More importantly, we have introduced a new dependent asynchronous tasking model that offers great flexibility for expressing dynamic task graph parallelism.
New Features
Taskflow Core
- Added new async methods to support dynamic task graph creation
- Added new async and join methods to tf::Runtime
- Added a new partitioner interface to optimize parallel algorithms
- Added parallel-scan algorithms to Taskflow
- tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop)
- tf::Taskflow::inclusive_scan(B first, E last, D d_first, BOP bop, T init)
- tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop)
- tf::Taskflow::transform_inclusive_scan(B first, E last, D d_first, BOP bop, UOP uop, T init)
- tf::Taskflow::exclusive_scan(B first, E last, D d_first, T init, BOP bop)
- tf::Taskflow::transform_exclusive_scan(B first, E last, D d_first, T init, BOP bop, UOP uop)
- Added parallel-find algorithms to Taskflow
- tf::Taskflow::find_if(B first, E last, T& result, UOP predicate, P&& part)
- tf::Taskflow::find_if_not(B first, E last, T& result, UOP predicate, P&& part)
- tf::Taskflow::min_element(B first, E last, T& result, C comp, P&& part)
- tf::Taskflow::max_element(B first, E last, T& result, C comp, P&& part)
- Modified tf::Subflow as a derived class from tf::Runtime
- Extended parallel algorithms to support different partitioning algorithms
- tf::Taskflow::for_each_index(B first, E last, S step, C callable, P&& part)
- tf::Taskflow::for_each(B first, E last, C callable, P&& part)
- tf::Taskflow::transform(B first1, E last1, O d_first, C c, P&& part)
- tf::Taskflow::transform(B1 first1, E1 last1, B2 first2, O d_first, C c, P&& part)
- tf::Taskflow::reduce(B first, E last, T& result, O bop, P&& part)
- tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop, P&& part)
- Improved the performance of tf::Taskflow::sort for plain-old-data (POD) type
cudaFlow
- removed algorithms that require buffer from tf::cudaFlow due to update limitation
- removed support for a dedicated cudaFlow task in Taskflow
- all usage of tf::cudaFlow and tf::cudaFlowCapturer are standalone now
Utilities
- Added all_same templates to check if a parameter pack has the same type
Taskflow Profiler (TFProf)
- Removed cudaFlow and syclFlow tasks
Bug Fixes
- Fixed the compilation error caused by clashing
MAX_PRIORITY wtih winspool.h (#459)
- Fixed the compilation error caused by tf::TaskView::for_each_successor and tf::TaskView::for_each_dependent
- Fixed the infinite-loop bug when corunning a module task from tf::Runtime
If you encounter any potential bugs, please submit an issue at issue tracker.
Breaking Changes
- Dropped support for cancelling asynchronous tasks
return 1;
});
std::optional<int> res = fu.get();
std::future<int> fu = executor.async([](){
return 1;
});
int res = fu.get();
class to access the result of an execution
Definition taskflow.hpp:630
bool cancel()
cancels the execution of the running taskflow associated with this future object
Definition taskflow.hpp:721
- Dropped in-place support for running tf::cudaFlow from a dedicated task
taskflow.emplace([](tf::cudaFlow& cf){
cf.offload();
});
taskflow.emplace([](){
tf::cudaFlow cf;
});
class to create a CUDA stream with unique ownership
Definition cuda_stream.hpp:189
cudaStreamBase & synchronize()
synchronizes the associated stream
Definition cuda_stream.hpp:232
cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)
runs the given executable CUDA graph
- Dropped in-place support for running tf::cudaFlowCapturer from a dedicated task
taskflow.emplace([](tf::cudaFlowCapturer& cf){
cf.offload();
});
taskflow.emplace([](){
tf::cudaFlowCapturer cf;
});
- Dropped in-place support for running tf::syclFlow from a dedicated task
- SYCL can just be used out of box together with Taskflow
- Move all buffer query methods of CUDA standard algorithms inside execution policy
- tf::cudaExecutionPolicy<NT, VT>::reduce_bufsz
- tf::cudaExecutionPolicy<NT, VT>::scan_bufsz
- tf::cudaExecutionPolicy<NT, VT>::merge_bufsz
- tf::cudaExecutionPolicy<NT, VT>::min_element_bufsz
- tf::cudaExecutionPolicy<NT, VT>::max_element_bufsz
tf::cuda_reduce_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);
tf::cudaDefaultExecutionPolicy policy(stream);
policy.reduce_bufsz<int>(N);
- Renamed tf::Executor::run_and_wait to tf::Executor::corun for expressiveness
- Renamed tf::Executor::loop_until to tf::Executor::corun_until for expressiveness
- Renamed tf::Runtime::run_and_wait to tf::Runtime::corun for expressiveness
- Disabled argument support for all asynchronous tasking features
- users are responsible for creating their own wrapper to make the callable
executor.async([](int i){ std::cout << i << std::endl; }, 4);
executor.async([i=4]( std::cout << i << std::endl; ){});
- Replaced
named_async with an overload that takes the name string on the first argument
executor.named_async("name", [](){});
executor.async("name", [](){});
Documentation
Miscellaneous Items
We have published Taskflow in the following venues:
- Dian-Lun Lin, Yanqing Zhang, Haoxing Ren, Shih-Hsin Wang, Brucek Khailany and Tsung-Wei Huang, "GenFuzz: GPU-accelerated Hardware Fuzzing using Genetic Algorithm with Multiple Inputs," ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, 2023
- Tsung-Wei Huang, "qTask: Task-parallel Quantum Circuit Simulation with Incrementality," IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, Florida, 2023
- Elmir Dzaka, Dian-Lun Lin, and Tsung-Wei Huang, "Parallel And-Inverter Graph Simulation Using a Task-graph Computing System," IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), St. Petersburg, Florida, 2023
Please do not hesitate to contact Dr. Tsung-Wei Huang if you intend to collaborate with us on using Taskflow in your scientific computing projects.