Release Notes » Release 3.0.0 (2021/01/01)

Taskflow 3.0.0 is the 1st release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.

Download

Taskflow 3.0.0 can be downloaded from here.

System Requirements

To use Taskflow v3.0.0, you need a compiler that supports C++17:

  • GNU C++ Compiler at least v7.0 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xcode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler at least v19.0.1 with -std=c++17

Taskflow works on Linux, Windows, and Mac OS X.

Working Items

  • enhancing the taskflow profiler (TFProf)
  • adding methods for updating tf::cudaFlow (with unit tests)
  • adding support for cuBLAS
  • adding support for cuDNN
  • adding support for SYCL (ComputeCpp and DPC++)

New Features

Taskflow Core

cudaFlow

  • added tf::cudaFlowCapturer for building a cudaFlow through stream capture (see GPU Tasking (cudaFlowCapturer))
  • added tf::cudaFlowCapturerBase for creating custom capturers
  • added tf::cudaFlow::capture for capturing a cudaFlow within a parent cudaFlow
  • added tf::Taskflow::emplace_on to place a cudaFlow on a GPU
  • added tf::cudaFlow::dump and tf::cudaFlowCapturer::dump to visualize cudaFlow
  • added tf::cudaFlow::offload and update methods to run and update a cudaFlow explicitly
  • supported standalone cudaFlow
  • supported standalone cudaFlowCapturer
  • added tf::cublasFlowCapturer to support cuBLAS (see LinearAlgebracublasFlowCapturer)

Utilities

  • added utility functions to grab the cuda device properties (see cuda_device.hpp)
  • added utility functions to control cuda memory (see cuda_memory.hpp)
  • added utility functions for common mathematics operations
  • added serializer and deserializer libraries to support tfprof
  • added per-thread pool for CUDA streams to improve performance

Taskflow Profiler (TFProf)

  • added visualization for asynchronous tasks
  • added server-based profiler to support large profiling data (see Profile Taskflow Programs)

New Algorithms

CPU Algorithms

GPU Algorithms

Bug Fixes

  • fixed the bug in stream capturing (need to use ThreadLocal mode)
  • fixed the bug in reporting wrong worker ids when compiling a shared library due to the use of thread_local (now with C++17 inline variable)

Breaking Changes

Deprecated and Removed Items

  • removed tf::cudaFlow::device; users may call tf::Taskflow::emplace_on to associate a cudaflow with a GPU device
  • removed tf::cudaFlow::join, use tf::cudaFlow::offload instead
  • removed the legacy tf::Framework
  • removed external mutable use of tf::TaskView

Documentation

Miscellaneous Items

We have presented Taskflow in the following C++ venues with recorded videos:

We have published Taskflow in the following conferences and journals: