We implement a parallel breadth-first search (BFS) using tf::Taskflow::for_each_by_index to process each frontier level in parallel. This example demonstrates how an iterative graph algorithm with a data-dependent loop structure maps onto Taskflow's stateful parallel iteration model.
Breadth-first search (BFS) is one of the most fundamental graph algorithms. Starting from a source node, it visits every reachable node in the graph and records the shortest distance (in number of edges) from the source to each node. It is used in network routing, social network analysis, game AI pathfinding, dependency resolution, and many other applications. The key idea is that BFS discovers nodes layer by layer. All nodes at distance 1 from the source are found first, then all nodes at distance 2, and so on. Each layer is called a frontier: the set of nodes discovered at the same distance from the source.
The following figure illustrates this process on a small graph. S is the source. Nodes at the same distance are at the same level and share the same colour. Edges point from each node to the neighbours it discovers in the next level:
The sequential algorithm processes one node at a time within each frontier. But notice that all nodes within a frontier are completely independent of each other: they were discovered in the same round and none of them depends on another node at the same level to compute its distance. This means all nodes in a frontier can be processed in parallel, relaxing their outgoing edges simultaneously across multiple CPU cores. For large graphs with wide frontiers (e.g., social networks, web graphs, road networks) this parallelism can be substantial. A frontier of one million nodes with an average degree of ten means ten million edge relaxations that can all run concurrently.
We represent the graph as an adjacency list and maintain two frontier buffers: curr_frontier holds the nodes being processed in the current level, and next_frontier accumulates the nodes discovered for the next level. An atomic counter next_size tracks how many nodes have been written into next_frontier so far.
Distance values are stored as std::atomic<int> so that multiple workers can race to claim an unvisited node safely. The first worker to set distance[v] from INF to a finite value wins; all others see the compare_exchange_strong fail and skip that node. This prevents any node from being added to next_frontier more than once.
Note that curr_frontier is read-only during the sweep and requires no synchronization; only distance[] and next_size are written concurrently.
The stateful std::ref(range) is the key to making this work without rebuilding the taskflow each level. The sweep task reads the range at execution time, so updating range with reset before each executor.run call is all that is needed to redirect the parallel loop to the new frontier.
The host loop calls tf::Executor::run once per frontier level, re-entering the executor each time. An alternative is to encode the loop termination check as a condition task inside the graph itself, so the entire BFS runs in a single tf::Executor::run call with minimal synchronization overhead:
The condition task runs on a single worker after sweep completes, so the promotion of next_frontier into curr_frontier and the range reset are sequenced correctly without any additional synchronization. The back-edge from check to sweep forms the BFS level loop; the executor drives the full traversal end-to-end in a single run call.
compare_exchange_strong on distance[v] is the correctness guarantee that prevents a node from being visited twice. If two workers simultaneously attempt to claim the same unvisited node v, exactly one compare_exchange_strong will succeed (the one that sees distance[v]==INF); the other will fail because distance[v] is no longer INF and will skip v. This ensures next_frontier contains each node at most once and that distance[v] holds the correct shortest-path distance from the source.