Release 4.0.0 (2026/01/01)
Release Summary
Happy new year! Starting in v4, Taskflow migrates the codebase to the C++20 standard for improved runtime performance and engineering efficiency. This release enhances scheduling efficiency through bulk scheduling, reduced atomic overhead, and a streamlined executor design.It also introduces new features such as tf::
Download
To download the newest version of Taskflow, please clone the master branch from Taskflow's GitHub.
System Requirements
To use Taskflow v4.0.0, you need a compiler that supports C++20:
- GNU C++ Compiler at least v11.0 with -std=c++20
- Clang C++ Compiler at least v12.0 with -std=c++20
- Microsoft Visual Studio at least v19.29 (VS 2019) with /std:c++20
- Apple Clang (Xcode) at least v13.0 with -std=c++20
- NVIDIA CUDA Toolkit and Compiler (nvcc) at least v12.0 with host compiler supporting C++20
- Intel oneAPI DPC++/C++ Compiler at least v2022.0 with -std=c++20
New Features
Taskflow Core
- replaced
std::in topology with virtual dispatch for better performancefunction - replaced
TF_LIKELY/_UNLIKELYwith C++20 attributes - extended the work-stealing queue to support the following new features:
- non-pointer type with std::
optional - bulk insertion to reduce the overhead of atomic operations
- non-pointer type with std::
- enhanced the executor performance with the following changes:
- bulk scheduling to reduce the overhead of atomic operations
- removed executor pointer from worker
- removed redundant executor pointer validation from the hot scheduling path
- adopted tf::
NonblockingNotifier as the default notifier due to its robustness - split notify_one and notify_all to two separate, optimized implementation
- added notify_n to improve the performance
- split essential members of tf::Node into tf::NodeBase to improve modularity
- split anchor nodes to implicit and explicit types to improve exception handling
- added unit test for tf::
NonblockingNotifier - added embarrassing parallelism benchmark
- added merge sort benchmark
- added tf::
TaskGroup to enable efficient implementation of recursive parallelism - revised the following benchmarks with tf::
TaskGroup for recursive parallelism - integrate
- skynet
- fibonacci
- nqueens
- enhanced the scheduling performance of dependent-async tasks with state merging
Utilities
- removed
ctzas it is duplicate with the C++20 bit manipulation library
Bug Fixes
Breaking Changes
Starting in v4, Taskflow will adopt the C++20 standard for better runtime performance and engineering efficiency. If your project does not support C++20, please use Taskflow v3.11.0, which can be downloaded here.
Documentation
- introduced Taskflow Academy - a new video suite for learning Taskflow
- introduced tf::
NonblockingNotifier - introduced Task Group
- revised Executor
- revised tf::
BoundedWSQ - revised tf::
UnboundedWSQ - revised Exception Handling
- revised Fibonacci Number
Miscellaneous Items
If you are interested in collaborating with us on applying Taskflow to your projects, please feel free to reach out to Dr. Tsung-Wei Huang!