Blog Details

Home / Blog Details
blog

What is new in DGL v0.5 release?

The recent DGL 0.5 release is a major update on many aspects of the project including documentation, APIs, system speed and scalability. This article highlights some of the new features and enhancements.

More docs, fewer codes

DGL has been through several releases with numerous new APIs and features. While the development pace is rapid, DGL’s documentation has been lagging behind. We have been aware of this issue and finally got a hand on it in this release. There are two major changes. A new user guide with dedicated chapters for the core concepts of DGL and how they connect with the pipeline of training/testing GNNs. There are currently seven chapters:

  • Graph: The chapter explains the basics about the graph data structure, the usage of the core DGLGraph class, heterogeneous graph and so on.
  • Message Passing: The chapter starts from the mathematical definition of the message passing neural networks and then explains how to express them in DGL.
  • Building GNN Modules: The chapter walk-throughs the steps to define GNN layers/modules in DGL for both homogeneous and heterogeneous graphs.
  • Graph Data Pipeline: The chapter explains how the datasets are organized in DGL and how to create with your own one.
  • Training Graph Neural Networks: The chapter provides guidance on training GNNs in DGL for node, edge and graph prediction tasks.
  • Stochastic Training on Large Graphs: The chapter introduces mini-batch training in the GNN domain and the designated DGL APIs.
  • Distributed Training: The chapter explains DGL’s components for training graphs scaling beyond one machine.

Besides the user guide, we have re-worked the API document extensively and organized them by their namespaces. We also took this chance to prune the set of API, deprecate rare and redundant APIs and consolidate functionalities into fewer. For example, the creation of a graph in DGL now only involves dgl.graph and dgl.heterograph for homogeneous and heterogeneous graphs, respectively. Another noticeable simplification is that DGLGraph is now the only class for storing graph and feature. It can represent a homogeneous or heterogeneous graph, a subgraph or a batched graph.

More flexibility on DGLGraph

The 0.5 release enables more flexibility on the core graph structure. First, DGL now supports creating graphs stored in int32; it not only cuts the memory consumption by half compared with int64, but also enables many fast operators only available for int32 provided by cuSPARSE. Second, previous DGL only provides APIs to control the host device of node/edge features, while in the new version, it allows changing the host device of the graph structure too (via DGLGraph.to). DGL has implemented many structure-related operators such as getting degrees, extracting subgraphs on CUDA. Third, to store giant graphs even more compactly, DGL adds the DGLGraph.formats API to control the internal sparse formats of graphs. This could reduce the memory consumption by half or more especially for storing the graph for sampling in mini-batch training. You can find the explanations of all these new features in the dedicated user guide chapter.

Faster and deterministic kernels

Alongside the usability improvement, the DGL team always keeps system performance at heart. We have conducted an extensive code refactoring during this release to reduce the Python stack overhead and enhance the code readability. In addition, we upgrade the core CPU/GPU kernels for message passing computation. Specifically, we found that the message passing in GNNs can be reduced to two general computational patterns: g-SpMM and g-SDDMM. The two patterns have a number of choices of parallelization and DGL carefully chooses the suitable ones based on the sparse format and operator type. Moreover, DGL by default chooses deterministic implementations for a better reproducibility. Read the updated white paper for more details about the new kernel design.

Scaling beyond one machine

Can DGL scale to giant graphs that cannot be fit in one machine? This question has been on our watch from the genesis of the DGL project. Despite several attempts from the previous releases, 0.5 is the very first release that thoroughly defines the user-facing APIs and components for distributed training. The goal is to create a coherent user experience of mini-batch training from on a single machine to multiple machines, ideally with few or no code changes. Specifically, this release includes the following new components:

  • To split a graph for distributed computation, DGL integrates a light-weight version of the highly-optimized METIS *graph partition toolkit.
  • DistGraphServer stores the partitioned graph structure and node/edge features on each machine. These servers work together to serve the graph data to training processes. One can deploy multiple servers on one machine to boost the service throughput.
  • New distributed sampler that interacts with remote servers and supports sampling from partitioned graph.
  • For training processes, DGL provides the DistGraph, DistTensor and DistEmbedding abstractions for accessing graph structures, node/edge features and embeddings stored remotely. There is also a convenient DistDataLoader to get mini-batches from the distributed sampler.

DGL has performed several optimizations within the entire stack. For example, when the sampler and target server are located in the same machine, they can communicate with each other through the local shared-memory, instead of using IPC or TCP/IP communication. More optimizations are coming in the future releases. To get started, check out the user guide chapter for distributed training and examples for training GraphSAGE and RGCN on the ogbn-paper100M dataset.

Further readings