Pregel: A System for Large-Scale Graph Processing

This week I will study Graph Processing Related papers. The Pregel: A System for Large-Scale Graph Processing was published on ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing 2009 by Google. It is the Large-scale graph computing at Google.

Large-Scale Graph

What is the scale of graph? The scale is the same as the scale of graph studied in FlashGraph, which is published in 2015.

The scale of these graphs in some cases billions of vertices, trillions of edges poses challenges to their efficient processing.

What is the challenge to do graph processing? Graph algorithms often exhibit:

  1. poor locality of memory access,
  2. very little work per vertex,
  3. and a changing degree of parallelism over the course of execution.

So, the Pregel aims to address these challenges? Not exactly. Pregel is targeted to provide a scalable general-purpose system for implementing arbitrary graph algorithms over arbitrary graph representations in a large-scale distributed environment. Pregel aims to:

  1. provide an easy to use programming interface
  2. and a optimizable graph processing engine.

Main Concepts of Pregel

The high-level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model.

The programming interface follows two rules:

  1. vertex-centric interface for user graph algorithms
  2. message passing interface for the graph processing engine

We can see that the design concepts of FlashGraph is very similar to Pregel.

In each iterations, called supersteps, the Pregel runs a user-specified operation on each vertex, then send messages to it’s adjecent vertex. in the architecture design of Pregel, the edges does not plays an important role as the vertex does. That is one of the reason why FlashGraph stores the edges on the SSDs while stores the vertex in Memory.

Vertex State Machine

vertex-state-machine

The Vertex State is a flag to identify whether a vertex need to be processed in an iteration. In the begin, every vertex is in active state. The vertex can turn itself to inactive state when there’s no additional computation to do. The inactive vertex can turn to active state only when externally triggered.

This sounds familiar if you also read the FlashGraph.

Optimization

The interface design of Pregel can support a lot of system architecture related optimization. FlashGraph is one example.