Learning from Examples » Graph Traversal

We study the graph traversal problem by visiting each vertex in parallel following their edge dependencies. Traversing a graph is a fundamental building block of many graph applications especially for large-scale graph analytics.

Problem Formulation

Given a directed acyclic graph (DAG), i.e., a graph that has no cycles, we would like to traverse each vertex in order without breaking dependency constraints defined by edges. The following figure shows a graph of six vertices and seven edges. Each vertex represents a particular task and each edge represents a task dependency between two tasks.

Taskflow Task1 Task1 Task2 Task2 Task1->Task2 Task3 Task3 Task1->Task3 Task4 Task4 Task1->Task4 Task5 Task5 Task2->Task5 Task3->Task5 Task6 Task6 Task4->Task6 Task5->Task6

Traversing the above graph in parallel, the maximum parallelism we can acquire is three. When Task1 finishes, we can run Task2, Task3, and Task4 in parallel.

Graph Representation

We define the data structure of our graph. The graph is represented by an array of nodes of the following structure:

struct Node {
  std::string name;
  size_t idx;                          // index of the node in a array
  bool visited {false};

  std::atomic<size_t> dependents {0};  // number of incoming edges
  std::vector<Node*> successors;       // number of outgoing edges

  void precede(Node& n) {
    successors.emplace_back(&n);
    n.dependents ++;
  }
};

Based on the data structure, we randomly generate a DAG using ordered edges.

std::unique_ptr<Node[]> make_dag(size_t num_nodes, size_t max_degree) {
  
  std::unique_ptr<Node[]> nodes(new Node[num_nodes]);
  
  // Make sure nodes are in clean state
  for(size_t i=0; i<num_nodes; i++) {
    nodes[i].idx = i;
    nodes[i].name = std::to_string(i);
  }

  // Create a DAG by randomly insert ordered edges
  for(size_t i=0; i<num_nodes; i++) {
    size_t degree {0};
    for(size_t j=i+1; j<num_nodes && degree < max_degree; j++) {
      if(std::rand() % 2 == 1) {
        nodes[i].precede(nodes[j]);
        degree ++;
      }
    }
  }

  return nodes;
}

The function, make_dag, accepts two arguments, num_nodes and max_degree, to restrict the number of nodes in the graph and the maximum number of outgoing edges of every node.

Static Traversal

We create a taskflow to traverse the graph using static tasks (see Static Tasking). Each task does nothing but marks visited to true and subtracts dependents from one, both of which are used for validation after the graph is traversed. In practice, this computation may be replaced with a heavy function.

tf::Taskflow taskflow;
tf::Executor executor;

std::unique_ptr<Node[]> nodes = make_dag(100000, 4);
std::vector<tf::Task> tasks;

// create the traversal task for each node
for(size_t i=0; i<num_nodes; ++i) {
  tf::Task task = taskflow.emplace([v=&(nodes[i])](){
    v->visited = true;
    for(size_t j=0; j<v->successors.size(); ++j) {
      v->successors[j]->dependents.fetch_sub(1);
    }
  }).name(nodes[i].name);

  tasks.push_back(task);
}

// create the dependency between nodes on top of the graph structure
for(size_t i=0; i<num_nodes; ++i) {
  for(size_t j=0; j<nodes[i].successors.size(); ++j) {
    tasks[i].precede(tasks[nodes[i].successors[j]->idx]);
  }
}

executor.run(taskflow).wait();

// after the graph is traversed, all nodes must be visited with no dependents
for(size_t i=0; i<num_nodes; i++) {
  assert(nodes[i].visited);
  assert(nodes[i].dependents == 0);
}

The code above has two parts to construct the parallel graph traversal. First, it iterates each node and constructs a traversal task for that node. Second, it iterates each outgoing edge of a node and creates a dependency between the node and the other end (successor) of that edge. The resulting taskflow structure is topologically equivalent to the given graph.

Taskflow p0x7f95e780b0d0 0 p0x7f95e780ac50 4 p0x7f95e780b0d0->p0x7f95e780ac50 p0x7f95e780ab30 5 p0x7f95e780b0d0->p0x7f95e780ab30 p0x7f95e780a8f0 7 p0x7f95e780b0d0->p0x7f95e780a8f0 p0x7f95e780a7d0 8 p0x7f95e780b0d0->p0x7f95e780a7d0 p0x7f95e780ac50->p0x7f95e780a8f0 p0x7f95e780ac50->p0x7f95e780a7d0 p0x7f95e780aa10 6 p0x7f95e780ac50->p0x7f95e780aa10 p0x7f95e780a6b0 9 p0x7f95e780ac50->p0x7f95e780a6b0 p0x7f95e780ab30->p0x7f95e780a8f0 p0x7f95e780ab30->p0x7f95e780a6b0 p0x7f95e780a470 11 p0x7f95e780ab30->p0x7f95e780a470 p0x7f95e780a110 14 p0x7f95e780ab30->p0x7f95e780a110 p0x7f95e780a8f0->p0x7f95e780a7d0 p0x7f95e780a8f0->p0x7f95e780a6b0 p0x7f95e780a590 10 p0x7f95e780a8f0->p0x7f95e780a590 p0x7f95e780a230 13 p0x7f95e780a8f0->p0x7f95e780a230 p0x7f95e780a7d0->p0x7f95e780a470 p0x7f95e780a7d0->p0x7f95e780a110 p0x7f95e780a350 12 p0x7f95e780a7d0->p0x7f95e780a350 p0x7f95e780a7d0->p0x7f95e780a230 p0x7f95e780afb0 1 p0x7f95e780afb0->p0x7f95e780ab30 p0x7f95e780afb0->p0x7f95e780a8f0 p0x7f95e780afb0->p0x7f95e780aa10 p0x7f95e780afb0->p0x7f95e780a6b0 p0x7f95e780aa10->p0x7f95e780a6b0 p0x7f95e780aa10->p0x7f95e780a110 p0x7f95e780aa10->p0x7f95e780a590 p0x7f95e780aa10->p0x7f95e780a350 p0x7f95e780a6b0->p0x7f95e780a470 p0x7f95e780a6b0->p0x7f95e780a110 p0x7f95e780a6b0->p0x7f95e780a350 p0x7f95e780a6b0->p0x7f95e780a230 p0x7f95e780ae90 2 p0x7f95e780ae90->p0x7f95e780ab30 p0x7f95e780ae90->p0x7f95e780a8f0 p0x7f95e780ae90->p0x7f95e780a7d0 p0x7f95e780ae90->p0x7f95e780aa10 p0x7f95e780ad70 3 p0x7f95e780ad70->p0x7f95e780ac50 p0x7f95e780ad70->p0x7f95e780ab30 p0x7f95e780ad70->p0x7f95e780a8f0 p0x7f95e780ad70->p0x7f95e780aa10 p0x7f95e780a470->p0x7f95e780a110 p0x7f95e780a470->p0x7f95e780a350 p0x7f95e780a470->p0x7f95e780a230 p0x7f95e7809ed0 16 p0x7f95e780a470->p0x7f95e7809ed0 p0x7f95e780a110->p0x7f95e7809ed0 p0x7f95e7809db0 17 p0x7f95e780a110->p0x7f95e7809db0 p0x7f95e7809810 22 p0x7f95e780a110->p0x7f95e7809810 p0x7f95e7809930 21 p0x7f95e780a110->p0x7f95e7809930 p0x7f95e780a590->p0x7f95e780a470 p0x7f95e780a590->p0x7f95e780a110 p0x7f95e780a590->p0x7f95e7809ed0 p0x7f95e780a590->p0x7f95e7809db0 p0x7f95e780a350->p0x7f95e780a230 p0x7f95e780a350->p0x7f95e7809db0 p0x7f95e7809a50 20 p0x7f95e780a350->p0x7f95e7809a50 p0x7f95e780a350->p0x7f95e7809810 p0x7f95e780a230->p0x7f95e780a110 p0x7f95e780a230->p0x7f95e7809ed0 p0x7f95e780a230->p0x7f95e7809db0 p0x7f95e7809c90 18 p0x7f95e780a230->p0x7f95e7809c90 p0x7f95e7809ed0->p0x7f95e7809db0 p0x7f95e7809ed0->p0x7f95e7809810 p0x7f95e7809b70 19 p0x7f95e7809ed0->p0x7f95e7809b70 p0x7f95e78096f0 23 p0x7f95e7809ed0->p0x7f95e78096f0 p0x7f95e7809db0->p0x7f95e7809a50 p0x7f95e7809db0->p0x7f95e7809b70 p0x7f95e7809db0->p0x7f95e78096f0 p0x7f95e7809150 28 p0x7f95e7809db0->p0x7f95e7809150 p0x7f95e7809a50->p0x7f95e7809810 p0x7f95e7809a50->p0x7f95e7809930 p0x7f95e7809a50->p0x7f95e78096f0 p0x7f95e78095d0 24 p0x7f95e7809a50->p0x7f95e78095d0 p0x7f95e7809810->p0x7f95e78096f0 p0x7f95e7809810->p0x7f95e7809150 p0x7f95e7809810->p0x7f95e78095d0 p0x7f95e7809390 26 p0x7f95e7809810->p0x7f95e7809390 p0x7f95e7809c90->p0x7f95e7809930 p0x7f95e7809c90->p0x7f95e7809b70 p0x7f95e7809c90->p0x7f95e78096f0 p0x7f95e7809c90->p0x7f95e78095d0 p0x7f95e7809930->p0x7f95e78095d0 p0x7f95e7809930->p0x7f95e7809390 p0x7f95e78094b0 25 p0x7f95e7809930->p0x7f95e78094b0 p0x7f95e7809030 29 p0x7f95e7809930->p0x7f95e7809030 p0x7f95e7809ff0 15 p0x7f95e7809ff0->p0x7f95e7809db0 p0x7f95e7809ff0->p0x7f95e7809810 p0x7f95e7809ff0->p0x7f95e7809c90 p0x7f95e7809ff0->p0x7f95e7809b70 p0x7f95e7809b70->p0x7f95e7809a50 p0x7f95e7809b70->p0x7f95e78096f0 p0x7f95e7809b70->p0x7f95e7809390 p0x7f95e7809270 27 p0x7f95e7809b70->p0x7f95e7809270 p0x7f95e78096f0->p0x7f95e7809150 p0x7f95e78096f0->p0x7f95e7809390 p0x7f95e78096f0->p0x7f95e78094b0 p0x7f95e7809150->p0x7f95e7809030 p0x7f95e78095d0->p0x7f95e7809390 p0x7f95e7809390->p0x7f95e7809150 p0x7f95e7809390->p0x7f95e7809270 p0x7f95e7809270->p0x7f95e7809150 p0x7f95e78094b0->p0x7f95e7809150 p0x7f95e78094b0->p0x7f95e7809030

With task parallelism, we flow computation naturally with the graph structure. The runtime autonomously distributes tasks across processor cores to obtain maximum task parallelism. You do not need to worry about details of scheduling.

Dynamic Traversal

We can traverse the graph dynamically using tf::Subflow (see Subflow Tasking). We start from the source nodes of zero incoming edges and recursively spawn subflows whenever the dependency of a node is meet. Since we are creating tasks from the execution context of another task, we need to store the task callable in advance.

tf::Taskflow taskflow;
tf::Executor executor;

// task callable of traversing a node using subflow
std::function<void(Node*, tf::Subflow&)> traverse;

traverse = [&] (Node* n, tf::Subflow& subflow) {
  assert(!n->visited);
  n->visited = true;
  for(size_t i=0; i<n->successors.size(); i++) {
    if(n->successors[i]->dependents.fetch_sub(1) == 1) {
      subflow.emplace([s=n->successors[i], &traverse](tf::Subflow &subflow){ 
        traverse(s, subflow); 
      }).name(n->name);
    }
  }
};

// create a graph
std::unique_ptr<Node[]> nodes = make_dag(100000, 4);

// find the source nodes (no incoming edges)
std::vector<Node*> src;
for(size_t i=0; i<num_nodes; i++) {
  if(nodes[i].dependents == 0) { 
    src.emplace_back(&(nodes[i]));
  }
}

// create only tasks for source nodes
for(size_t i=0; i<src.size(); i++) {
  taskflow.emplace([s=src[i], &traverse](tf::Subflow& subflow){ 
    traverse(s, subflow); 
  }).name(nodes[i].name);
}

executor.run(taskflow).wait();

// after the graph is traversed, all nodes must be visited with no dependents
for(size_t i=0; i<num_nodes; i++) {
  assert(nodes[i].visited);
  assert(nodes[i].dependents == 0);
}

A partial graph is shown as follows:

Taskflow cluster_p0x7fd36b804d90 Subflow: 3 cluster_p0x7fd36c005e90 Subflow: 3 cluster_p0x7fd36c005c50 Subflow: 4 cluster_p0x7fd36c005a10 Subflow: 6 cluster_p0x7fd36c005470 Subflow: 9 cluster_p0x7fd36c005590 Subflow: 11 cluster_p0x7fd36c0057d0 Subflow: 12 cluster_p0x7fd36c005350 Subflow: 13 cluster_p0x7fd36c005b30 Subflow: 4 p0x7fd36b804c70 0 p0x7fd36b804a30 1 p0x7fd36b804b50 2 p0x7fd36b804d90 3 p0x7fd36c005e90 3 p0x7fd36c005e90->p0x7fd36b804d90 p0x7fd36c005c50 4 p0x7fd36c005c50->p0x7fd36c005e90 p0x7fd36c005a10 6 p0x7fd36c005a10->p0x7fd36c005c50 p0x7fd36c005470 9 p0x7fd36c005470->p0x7fd36c005a10 p0x7fd36c005590 11 p0x7fd36c005590->p0x7fd36c005470 p0x7fd36c0057d0 12 p0x7fd36c0057d0->p0x7fd36c005590 p0x7fd36c005350 13 p0x7fd36c005350->p0x7fd36c0057d0 p0x7fd36c005230 14 p0x7fd36c005230->p0x7fd36c005350 p0x7fd36c0058f0 6 p0x7fd36c0058f0->p0x7fd36c005c50 p0x7fd36c005b30 4 p0x7fd36c005b30->p0x7fd36c005e90 p0x7fd36c0056b0 7 p0x7fd36c0056b0->p0x7fd36c005b30 p0x7fd36c005d70 3 p0x7fd36c005d70->p0x7fd36b804d90 p0x7fd36b804eb0 4

In general, the dynamic version of graph traversal is slower than the static version due to the overhead incurred by spawning subflows. However, it may be useful for the situation where the graph structure is unknown at once but being partially explored during the traversal.