Release 3.0.0 (2021/01/01)
Taskflow 3.0.0 is the 1st release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.
Download
Taskflow 3.0.0 can be downloaded from here.
System Requirements
To use Taskflow v3.0.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v7.0 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xcode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
Taskflow works on Linux, Windows, and Mac OS X.
Working Items
- enhancing the taskflow profiler (TFProf)
- adding methods for updating tf::
cudaFlow (with unit tests) - adding support for cuBLAS
- adding support for cuDNN
- adding support for SYCL (ComputeCpp and DPC++)
New Features
Taskflow Core
- replaced all non-standard libraries with C++17 STL (e.g., std::
optional, std:: variant) - added tf::
WorkerView for users to observe the running works of tasks - added asynchronous tasking (see Asynchronous Tasking)
- modified tf::
ObserverInterface:: on_entry and tf:: ObserverInterface:: on_exit to take tf:: WorkerView - added a custom graph interface to support dynamic polymorphism for tf::cudaGraph
- supported separate compilations between Taskflow and CUDA (see Compile Taskflow with CUDA)
- added tf::
Semaphore and tf::CriticalSection to limit the maximum concurrency - added tf::
Future to support cancellation of submitted tasks (see Request Cancellation)
cudaFlow
- added tf::
cudaFlowCapturer for building a cudaFlow through stream capture (see GPU Tasking (cudaFlowCapturer)) - added tf::cudaFlowCapturerBase for creating custom capturers
- added tf::
cudaFlow:: capture for capturing a cudaFlow within a parent cudaFlow - added tf::Taskflow::emplace_on to place a cudaFlow on a GPU
- added tf::
cudaFlow:: dump and tf:: cudaFlowCapturer:: dump to visualize cudaFlow - added tf::cudaFlow::offload and update methods to run and update a cudaFlow explicitly
- supported standalone cudaFlow
- supported standalone cudaFlowCapturer
- added tf::cublasFlowCapturer to support cuBLAS (see LinearAlgebracublasFlowCapturer)
Utilities
- added utility functions to grab the cuda device properties (see cuda_
device.hpp) - added utility functions to control cuda memory (see cuda_
memory.hpp) - added utility functions for common mathematics operations
- added serializer and deserializer libraries to support tfprof
- added per-thread pool for CUDA streams to improve performance
Taskflow Profiler (TFProf)
- added visualization for asynchronous tasks
- added server-based profiler to support large profiling data (see Profile Taskflow Programs)
New Algorithms
CPU Algorithms
- added parallel sort (see Parallel Sort)
GPU Algorithms
- added single task (see Single Task)
- added parallel iterations (see Parallel Iterations)
- added parallel transforms
- added parallel reduction
Bug Fixes
- fixed the bug in stream capturing (need to use
ThreadLocal
mode) - fixed the bug in reporting wrong worker ids when compiling a shared library due to the use of
thread_local
(now with C++17inline
variable)
Breaking Changes
- changed the returned values of asynchronous tasks to be std::
optional in order to support cancellation (see Asynchronous Tasking and Request Cancellation)
Deprecated and Removed Items
- removed tf::cudaFlow::device; users may call tf::Taskflow::emplace_on to associate a cudaflow with a GPU device
- removed tf::cudaFlow::join, use tf::cudaFlow::offload instead
- removed the legacy tf::Framework
- removed external mutable use of tf::
TaskView
Documentation
- added Compile Taskflow with CUDA
- added Benchmark Taskflow
- added Limit the Maximum Concurrency
- added Asynchronous Tasking
- added GPU Tasking (cudaFlowCapturer)
- added Request Cancellation
- added Profile Taskflow Programs
- added cudaFlow Algorithms
- Single Task to run a kernel function in just a single thread
- Parallel Iterations to perform parallel iterations over a range of items
- Parallel Transforms to perform parallel transforms over a range of items
- added Governance
- added Contributing
- revised Conditional Tasking
- revised documentation pages for files
Miscellaneous Items
We have presented Taskflow in the following C++ venues with recorded videos:
We have published Taskflow in the following conferences and journals:
- Tsung-Wei Huang, "A General-purpose Parallel and Heterogeneous Task Programming System for VLSI CAD," IEEE/ACM International Conference on Computer-aided Design (ICCAD), CA, 2020
- Chun-Xun Lin, Tsung-Wei Huang, and Martin Wong, "An Efficient Work-Stealing Scheduler for Task Dependency Graph," IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, 2020
- Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, and Chun-Xun Lin, "Cpp-Taskflow: A General-purpose Parallel Task Programming System at Scale," IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (TCAD), to appear, 2020