Building and Installing » Benchmark Taskflow

Compile and Run Benchmarks

To build the benchmark code, enable the CMake option TF_BUILD_BENCHMARKS to ON as follows:

# under /taskflow/build
~$ cmake ../ -DTF_BUILD_BENCHMARKS=ON
~$ make

After you successfully build the benchmark code, you can find all benchmark instances in the benchmarks/ folder. You can run the executable of each instance in the corresponding folder.

~$ cd benchmarks & ls
bench_black_scholes bench_binary_tree bench_graph_traversal ...
~$ ./bench_graph_traversal
|V|+|E|     Runtime
      2       0.197
    842       0.198
   3284       0.488
   7288       0.774
    ...         ...
    ...         ...
 619802      75.135
 664771      77.436
 711200      83.957

You can display the help message by giving the option –help.

~$ ./bench_graph_traversal --help
Graph Traversal
Usage: ./bench_graph_traversal [OPTIONS]

Options:
  -h,--help                   Print this help message and exit
  -t,--num_threads UINT       number of threads (default=1)
  -r,--num_rounds UINT        number of rounds (default=1)
  -m,--model TEXT             model name tbb|omp|tf (default=tf)

We currently implement the following instances that are commonly used by the parallel computing community to evaluate the system performance.

Instance	Description
bench_binary_tree	traverses a complete binary tree
bench_black_scholes	computes option pricing with Black-Shcoles Models
bench_graph_traversal	traverses a randomly generated direct acyclic graph
bench_linear_chain	traverses a linear chain of tasks
bench_mandelbrot	exploits imbalanced workloads in a Mandelbrot set
bench_matrix_multiplication	multiplies two 2D matrices
bench_mnist	trains a neural network-based image classifier on the MNIST dataset
bench_parallel_sort	sorts a range of items
bench_reduce_sum	sums a range of items using reduction
bench_wavefront	propagates computations in a 2D grid
bench_linear_pipeline	performs pipeline parallelism on a linear chain of pipes
bench_graph_pipeline	performs pipeline parallelism on a graph of pipes
bench_deferred_pipeline	performs pipeline parallelism with dependencies from future pipes
bench_data_pipeline	performs pipeline parallelisms on a cache-friendly data wrapper
bench_thread_pool	uses our executor as a simple thread pool
bench_for_each	performs parallel-iteration algorithms
bench_scan	performs parallel-scan algorithms
bench_async_task	creates asynchronous tasks
bench_fibonacci	finds Fibonacci numbers using recursive asynchronous tasking
bench_nqueens	parallelizes n-queen search using recursive asynchronous tasking
bench_integrate	parallelizes integration using recursive asynchronous tasking
bench_primes	finds a range of prime numbers using parallel-reduction algorithms
bench_skynet	traverses a 10-ray tree using recursive asynchronous tasking

Configure Run Options

We implement consistent options for each benchmark instance. Common options are:

option	value	function
`-h`	none	displays the help message
`-t`	integer	configures the number of threads to run
`-r`	integer	configures the number of rounds to run
`-m`	string	configures the baseline models to run, tbb, omp, or tf

You can configure the benchmarking environment by giving different options.

In addition to a Taskflow-based implementation for each benchmark instance, we have implemented two baseline models using the state-of-the-art parallel programming libraries, OpenMP and Intel TBB, to measure and evaluate the performance of Taskflow. You can select different implementations by passing the option -m.

~$ ./bench_graph_traversal -m tf   # run the Taskflow implementation (default)
~$ ./bench_graph_traversal -m tbb  # run the TBB implementation
~$ ./bench_graph_traversal -m omp  # run the OpenMP implementation

Specify the Number of Threads

You can configure the number of threads to run a benchmark instance by passing the option -t. The default value is one.

# run the Taskflow implementation using 4 threads
~$ ./bench_graph_traversal -m tf -t 4

Depending on your environment, you may need to use taskset to set the CPU affinity of the running process. This allows the OS scheduler to keep process on the same CPU(s) as long as practical for performance reason.

# affine the process to 4 CPUs, CPU 0, CPU 1, CPU 2, and CPU 3
~$ taskset -c 0-3 bench_graph_traversal -t 4

Specify the Number of Rounds

Each benchmark instance evaluates the runtime of the implementation at different problem sizes. Each problem size corresponds to one iteration. You can configure the number of rounds per iteration to average the runtime.

# measure the %Taskflow runtime by averaging the results over 10 runs
~$ ./bench_graph_traversal -r 10 -m tf
|V|+|E|     Runtime
      2       0.109   # the runtime value 0.109 is an average of 10 runs
    842       0.298
    ...         ...
 619802      73.135
 664771      74.436

Building and Installing » Benchmark Taskflow

Compile and Run Benchmarks

Configure Run Options

Specify the Run Model

Specify the Number of Threads

Specify the Number of Rounds