Release Notes » Release 3.1.0 (2021/04/14)

Taskflow 3.1.0 is the 2nd release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.


Taskflow 3.1.0 can be downloaded from here.

System Requirements

To use Taskflow v3.1.0, you need a compiler that supports C++17:

  • GNU C++ Compiler at least v8.4 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler at least v19.0.1 with -std=c++17
  • Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

New Features

Taskflow Core

  • optimized task node storage by using std::unique_ptr for semaphores
  • introduced tf::syclFlow based on Intel DPC++ and SYCL 2020 spec
  • merged the execution flow of cudaFlow and cudaFlow capturer


  • optimized tf::cudaRoundRobinCapturing through an event-pruning heuristic
  • optimized the default block size used in cudaFlow algorithms
  • added tf::cudaFlow::clear() to clean up a cudaFlow
  • added tf::cudaFlow::num_tasks() to query the task count in a cudaFlow
  • added tf::cudaTask::num_dependents() to query the dependent count in a cudaTask
  • added tf::cudaFlowCapturer::clear() to clean up a cudaFlow capturer
  • added tf::cudaFlowCapturer::num_tasks() to query the task count in a cudaFlow capturer
  • added tf::cudaFlowCapturer rebind methods:
    • tf::cudaFlowCapturer::rebind_single_task
    • tf::cudaFlowCapturer::rebind_for_each
    • tf::cudaFlowCapturer::rebind_for_each_index
    • tf::cudaFlowCapturer::rebind_transform
    • tf::cudaFlowCapturer::rebind_reduce
    • tf::cudaFlowCapturer::rebind_uninitialized_reduce
  • added tf::cudaFlow update methods:
    • tf::cudaFlow::update_for_each
    • tf::cudaFlow::update_for_each_index
    • tf::cudaFlow::update_transform
    • tf::cudaFlow::update_reduce
    • tf::cudaFlow::update_uninitialized_reduce
  • added cudaFlow examples:
    • parallel reduction (examples/cuda/
    • parallel transform (examples/cuda/
    • rebind (examples/cuda/


  • added a task graph-based programming model (see GPU Tasking (syclFlow))
  • added syclFlow examples:
    • device query (examples/sycl/sycl_device.cpp)
    • range query (examples/sycl/sycl_ndrange.cpp)
    • saxpy kernel (examples/sycl/sycl_saxpy.cpp)
    • atomic operation using oneAPI atomic_ref (examples/sycl/sycl_atomic.cpp)
    • vector addition (examples/sycl/sycl_vector_add.cpp)
    • parallel reduction (examples/sycl/sycl_reduce.cpp)
    • matrix multiplication (examples/sycl/sycl_matmul.cpp)
    • parallel transform (examples/sycl/transform.cpp)
    • rebind (examples/sycl/sycl_rebind.cpp)
  • added syclFlow algorithms

Please visit these pages, GPU Tasking (syclFlow) and Compile Taskflow with SYCL, to know more details about compiling and running syclFlow programs.


  • resolved the compiler warning in serializer caused by constexpr if
  • resolved the compiler error of nvcc when parsin variadic namespace

Taskflow Profiler (TFProf)

  • added support for syclflow task

Bug Fixes

  • fixed the macro expansion issue with MSVC on TF_CUDA_CHECK
  • fixed the serializer compile error (#288)
  • fixed the tf::cudaTask::type bug in mixing host and empty task types

Breaking Changes

There are no breaking changes in this release.

Deprecated and Removed Items

There are no deprecated or removed items in this release.


Miscellaneous Items

  • removed Circle-CI from the continuous integration
  • updated grok to the user list
  • updated RavEngine to the user list
  • updated RPGMPacker to the user list
  • updated Leanify to the user list