Release Summary
This release improves scheduling performance through optimized work-stealing threshold tuning and a constrained decentralized buffer. It also introduces index-range-based parallel-for and parallel-reduction algorithms and modifies subflow tasking behavior to significantly enhance the performance of recursive parallelism.
Download
Taskflow 3.10.0 can be downloaded from here.
System Requirements
To use Taskflow v3.10.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- Apple Clang Xcode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17
Taskflow works on Linux, Windows, and Mac OS X.
- Attention
- Although Taskflow supports primarily C++17, you can enable C++20 compilation through
-std=c++20 to achieve better performance due to new C++20 features.
New Features
Taskflow Core
- optimized work-stealing loop with an adaptive breaking strategy
- optimized shut-down signal detection using decentralized variables
- optimized memory layout of node by combining successors and predecessors together
- changed the default notifier to use the atomic notification algorithm under C++20
- added debug mode for the windows CI to GitHub actions
- added index range-based parallel-for algorithm (#551)
std::vector<int> data1(100), data2(100);
taskflow.for_each_index(0, 100, 1, [&](int i){ data1[i] = 10; });
data2[i] = 10;
}
});
class to create an index range of integral indices with a step size
Definition iterator.hpp:117
T end() const
queries the ending index of the range
Definition iterator.hpp:150
T begin() const
queries the starting index of the range
Definition iterator.hpp:145
T step_size() const
queries the step size of the range
Definition iterator.hpp:155
- added index range-based parallel-reduction algorithm (#654)
std::vector<double> data(100000);
double res = 1.0;
taskflow.reduce_by_index(
res,
double residual = running_total ? *running_total : 0.0;
data[i] = 1.0;
residual += data[i];
}
printf("partial sum = %lf\n", residual);
return residual;
},
std::plus<double>()
);
Utilities
Bug Fixes
- fixed the compilation error of CLI11 due to version incompatibility (#672)
- fixed the compilation error of template deduction on packaged_task (#657)
- fixed the MSVC compilation error due to macro clash with std::min and std::max (#670)
- fixed the runtime error due to the use of latch in tf::Executor::Executor (#667)
- fixed the compilation error due to incorrect const qualifier used in algorithms (#673)
- fixed the TSAN error when using find-if algorithm tasks with closure wrapper (#675)
- fixed the task trait bug in incorrect detection for subflow and runtime tasks (#679)
- fixed the infinite steal caused by incorrect
num_empty_steals (#681)
Breaking Changes
- corrected the terminology by replacing 'dependents' with 'predecessors'
- disabled the support for tf::Subflow::detach due to multiple intricate and unresolved issues:
- detached subflows are inherently difficult to reason about their execution logic
- detached subflows can incur excessive memory consumption, especially in recursive workloads
- detached subflows lack a manner to safe life cycle control and graph cleanup
- detached subflows have limited practical benefits for most use cases
- detached subflows can be re-implemented using taskflow composition
- changed the default behavior of tf::Subflow to no longer retain its task graph after join
- default retention can incur significant memory consumption problem (#674)
- users must explicitly call tf::Subflow::retain to retain a subflow after join
auto A = sf.
emplace([](){ std::cout <<
"A\n"; });
auto B = sf.
emplace([](){ std::cout <<
"B\n"; });
auto C = sf.
emplace([](){ std::cout <<
"C\n"; });
});
executor.
run(taskflow).wait();
taskflow.
dump(std::cout);
class to create an executor
Definition executor.hpp:62
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:1352
class to construct a subflow graph from the execution of a dynamic task
Definition flow_builder.hpp:1516
void retain(bool flag) noexcept
specifies whether to keep the subflow after it is joined
Definition flow_builder.hpp:1625
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:947
class to create a taskflow object
Definition taskflow.hpp:64
void dump(std::ostream &ostream) const
dumps the taskflow to a DOT format through a std::ostream target
Definition taskflow.hpp:433
- disabled the support for tf::cudaFlow and tf::cudaFlowCapturer
cudaTask kernel(dim3 g, dim3 b, size_t s, F f, ArgsT... args)
creates a kernel task
Definition cuda_graph.hpp:1010
cudaStreamBase & synchronize()
synchronizes the associated stream
Definition cuda_stream.hpp:232
cudaStreamBase & run(const cudaGraphExecBase< C, D > &exec)
runs the given executable CUDA graph
cudaGraphExecBase< cudaGraphExecCreator, cudaGraphExecDeleter > cudaGraphExec
default smart pointer type to manage a cudaGraphExec_t object with unique ownership
Definition cudaflow.hpp:23
cudaGraphBase< cudaGraphCreator, cudaGraphDeleter > cudaGraph
default smart pointer type to manage a cudaGraph_t object with unique ownership
Definition cudaflow.hpp:18
cudaStreamBase< cudaStreamCreator, cudaStreamDeleter > cudaStream
default smart pointer type to manage a cudaStream_t object with unique ownership
Definition cuda_stream.hpp:340
Documentation
Miscellaneous Items
If you are interested in collaborating with us on applying Taskflow to your projects, please feel free to reach out to Dr. Tsung-Wei Huang!