Release Notes » Release 3.2.0 (2021/07/29)

Taskflow 3.2.0 is the 3rd release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.

Download

Taskflow 3.2.0 can be downloaded from here.

System Requirements

To use Taskflow v3.2.0, you need a compiler that supports C++17:

  • GNU C++ Compiler at least v8.4 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xcode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler at least v19.0.1 with -std=c++17
  • Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Working Items

  • enhancing support for SYCL with Intel DPC++
  • enhancing parallel CPU and GPU algorithms
  • designing pipeline interface and its scheduling algorithms

New Features

Taskflow Core

cudaFlow

New algorithms in tf::cudaFlow and tf::cudaFlowCapturer:

  • added tf::cudaFlow::reduce
  • added tf::cudaFlow::transform_reduce
  • added tf::cudaFlow::uninitialized_reduce
  • added tf::cudaFlow::transform_uninitialized_reduce
  • added tf::cudaFlow::inclusive_scan
  • added tf::cudaFlow::exclusive_scan
  • added tf::cudaFlow::transform_inclusive_scan
  • added tf::cudaFlow::transform_exclusive_scan
  • added tf::cudaFlow::merge
  • added tf::cudaFlow::merge_by_key
  • added tf::cudaFlow::sort
  • added tf::cudaFlow::sort_by_key
  • added tf::cudaFlow::find_if
  • added tf::cudaFlow::min_element
  • added tf::cudaFlow::max_element
  • added tf::cudaFlowCapturer::reduce
  • added tf::cudaFlowCapturer::transform_reduce
  • added tf::cudaFlowCapturer::uninitialized_reduce
  • added tf::cudaFlowCapturer::transform_uninitialized_reduce
  • added tf::cudaFlowCapturer::inclusive_scan
  • added tf::cudaFlowCapturer::exclusive_scan
  • added tf::cudaFlowCapturer::transform_inclusive_scan
  • added tf::cudaFlowCapturer::transform_exclusive_scan
  • added tf::cudaFlowCapturer::merge
  • added tf::cudaFlowCapturer::merge_by_key
  • added tf::cudaFlowCapturer::sort
  • added tf::cudaFlowCapturer::sort_by_key
  • added tf::cudaFlowCapturer::find_if
  • added tf::cudaFlowCapturer::min_element
  • added tf::cudaFlowCapturer::max_element
  • added tf::cudaLinearCapturing

syclFlow

CUDA Standard Parallel Algorithms

Utilities

  • added CUDA meta programming
  • added SYCL meta programming

Taskflow Profiler (TFProf)

Bug Fixes

  • fixed compilation errors in constructing tf::cudaRoundRobinCapturing
  • fixed compilation errors of TLS worker pointer in tf::Executor
  • fixed compilation errors of nvcc v11.3 in auto template deduction
    • std::scoped_lock
    • tf::Serializer and tf::Deserializer
  • fixed memory leak when moving a tf::Taskflow

Breaking Changes

There are no breaking changes in this release.

Deprecated and Removed Items

  • removed tf::cudaFlow::kernel_on method
  • removed explicit partitions in parallel iterations and reductions
  • removed tf::cudaFlowCapturerBase
  • removed tf::cublasFlowCapturer
  • renamed update and rebind methods in tf::cudaFlow and tf::cudaFlowCapturer to overloads

Documentation

Miscellaneous Items

We have published tf::cudaFlow in the following conference:

  • Dian-Lun Lin and Tsung-Wei Huang, "Efficient GPU Computation using Task Graph Parallelism," European Conference on Parallel and Distributed Computing (EuroPar), 2021