Release 3.10.0 (Master)
Taskflow 3.10.0 is the newest developing line to new features and improvements we continue to support. It is also where this documentation is generated. Many things are considered experimental and may change or break from time to time. While it may be difficult to be keep all things consistent when introducing new features, we continue to try our best to ensure backward compatibility.
Download
To download the newest version of Taskflow, please clone the master branch from Taskflow's GitHub.
System Requirements
To use Taskflow v3.10.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xcode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17
Taskflow works on Linux, Windows, and Mac OS X.
Release Summary
This release improves scheduling performance through optimized work-stealing threshold tuning and a constrained decentralized buffer. It also introduces index-range-based parallel-for and parallel-reduction algorithms and modifies subflow tasking behavior to significantly enhance the performance of recursive parallelism.
New Features
Taskflow Core
- optimized work-stealing loop with an adaptive breaking strategy
- optimized shut-down signal detection using decentralized variables
- optimized memory layout of node by combining successors and predecessors together
- added debug mode for the windows CI to GitHub actions
- added index range-based parallel-for algorithm (#551)
// initialize data1 and data2 to 10 using two different approaches std::vector<int> data1(100), data2(100); // Approach 1: initialize data1 using explicit index range taskflow.for_each_index(0, 100, 1, [&](int i){ data1[i] = 10; }); // Approach 2: initialize data2 using tf::IndexRange tf::IndexRange<int> range(0, 100, 1); taskflow.for_each_by_index(range, [&](tf::IndexRange<int>& subrange){ for(int i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) { data2[i] = 10; } });
- added index range-based parallel-reduction algorithm (i#654)
std::vector<double> data(100000); double res = 1.0; taskflow.reduce_by_index( // index range tf::IndexRange<size_t>(0, N, 1), // final result res, // local reducer [&](tf::IndexRange<size_t> subrange, std::optional<double> running_total) { double residual = running_total ? *running_total : 0.0; for(size_t i=subrange.begin(); i<subrange.end(); i+=subrange.step_size()) { data[i] = 1.0; residual += data[i]; } printf("partial sum = %lf\n", residual); return residual; }, // global reducer std::plus<double>() );
- added
static
keyword to the executor creation in taskflow benchmarks
Utilities
- added tf::
IndexRange for index range-based parallel-for algorithm - added tf::
distance to calculate the number of iterations in an index range - added tf::
is_index_range_invalid to check if the given index range is valid
Bug Fixes
- fixed the compilation error of CLI11 due to version incompatibility (#672)
- fixed the compilation error of template deduction on packaged_task (#657)
- fixed the MSVC compilation error due to macro clash with std::min and std::max (#670)
- fixed the runtime error due to the use of latch in tf::
Executor:: Executor (#667) - fixed the compilation error due to incorrect const qualifier used in algorithms (#673)
- fixed the tsan error when using find-if algorithm tasks with closure wrapper (#675)
Breaking Changes
- corrected the terminology by replacing 'dependents' with 'predecessors'
- tf::
Task:: num_predecessors (previously tf::Task::num_dependents) - tf::
Task:: for_each_predecessor (previously tf::Task::for_each_dependent) - tf::
Task:: num_strong_dependencies (previously tf::Task::num_strong_dependents) - tf::
Task:: num_weak_dependencies (previously tf::Task::num_weak_dependents)
- tf::
- disabled the support for tf::Subflow::detach due to multiple intricate and unresolved issues:
- detached subflows are inherently difficult to reason about their execution logic
- detached subflows can incur excessive memory consumption, especially in recursive workloads
- detached subflows lack a manner to safe life cycle control and graph cleanup
- detached subflows have limited practical benefits for most use cases
- detached subflows can be re-implemented using taskflow composition
- changed the default behavior of tf::
Subflow to no longer retain its task graph after join - default retention can incur significant memory consumption problem (#674)
- users must explicitly call tf::
Subflow:: retain to retain a subflow after join
tf::Taskflow taskflow; tf::Executor executor; taskflow.emplace([&](tf::Subflow& sf){ sf.retain(true); // retain the subflow after join for visualization auto A = sf.emplace([](){ std::cout << "A\n"; }); auto B = sf.emplace([](){ std::cout << "B\n"; }); auto C = sf.emplace([](){ std::cout << "C\n"; }); A.precede(B, C); // A runs before B and C }); // subflow implicitly joins here executor.run(taskflow).wait(); // The subflow graph is now retained and can be visualized using taskflow.dump(...) taskflow.dump(std::cout);
Documentation
- revised Subflow Tasking
- revised Parallel Iterations
- revised Parallel Reduction
- revised Parallel Find
Miscellaneous Items
Please do not hesitate to contact Dr. Tsung-Wei Huang if you intend to collaborate with us on using Taskflow in your scientific computing projects.