Blog Details

Home / Blog Details
blog

What is new in DGL v0.4.3 release?

The DGL v0.4.3 release brings many new features for an enhanced usability and system efficiency. The article takes a peek at some of the major highlights.

TensorFlow support

DGL finally comes to the TensorFlow community starting from this release. Switching to TensorFlow is easy. If you are a first-time user, please install DGL and import dgl, and then follow the instructions to set the default backend. You can always switch back by changing the config.json file, which is under ~/.dgl folder. DGL keeps a coherent user experience regardless of which backend is currently in use. The following code demonstrates the basic steps to apply a graph convolution layer.

import tensorflow as tf
import dgl.nn as dglnn

# Random features for 10 nodes; each is of length 5.
x = tf.random.normal((10, 5))
# Random graph; 10 nodes and 20 edges.
g = dgl.rand_graph(10, 20)
# Pre-defined graph convolution module.
conv = dglnn.GraphConv(5, 3)
y = conv(g, x)  # Apply the graph convolution layer.

We have implemented and released 15 common GNN modules in TensorFlow (more are coming), all of which can be invoked in one line of codes.

Our preliminary benchmark shows strong performance improvement against other TF-based tools for GNNs in terms of both training speed (measured by epoch running time in seconds) and memory consumption.

Dateset Model DGL GraphNet tf_geometric
Core GCN 0.0148 0.0152 0.0192
Reddit GCN 0.1095 OOM OOM
PubMed GCN 0.0156 0.0553 0.0185
PPI GCN 0.09 0.16 0.21
Cora GAT 0.0442 n/a 0.058
PPI GAT 0.398 n/a 0.752

To get started, install DGL and check out the examples here.

DGL-KE: A light-speed package for learning knowledge graph embeddings

Previously incubated under the DGL main repository, DGL-KE now officially announces its 0.1 release as a standalone package. The key highlights are:

  • Effortlessly generate knowledge graph embedding with one line of code.
  • Support for giant graphs with millions of nodes and edges.
  • Distributed training with highly-optimized graph partitioning, negative sampling and communication, which can be deployed on both multi-GPU machines and multi-machine clusters.

DGL-KE can be installed with pip:

pip install dglke

The following command trains embeddings of the full FreeBase graph (over 86M nodes and 338M edges) with 8 GPUs.

dglke_train --model TransE_l2 --dataset Freebase --batch_size 1000 \
            --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 \
            --regularization_coef 1e-9 -adv --gpu 0 1 2 3 4 5 6 7 \
            --max_step 320000 --log_interval 10000 --async_update \
            --rel_part --force_sync_interval 10000

DGL-KE is designed for learning at scale and speed. Our benchmark on the full FreeBase graph shows that DGL-KE can train embeddings under 100 minutes on an 8-GPU machine and under 30 minutes on a 4-machine cluster (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.

DGL-KE v.s. PyTorch-BigGraph on FreeBase
DGL-KE v.s. GraphVite on FB15k

Check out our new GitHub repository, examples and documentations under https://github.com/awslabs/dgl-ke

DGL-LifeSci: Bringing Graph Neural Networks to Chemistry and Biology

Previously incubated as a model zoo for chemistry, DGL-LifeSci is now spun off as a standalone package. The key highlights are:

  • Training scripts and pre-trained models for various applications — molecular property prediction, generative models, and reaction prediction.
  • Up to 5.5x model training speedup compared to previous implementations.
  • Well defined pipelines for data processing, model construction and evaluation.

DGL-LifeSci can be installed with pip or conda.

pip install dgllife
conda install -c dglteam dgllife

A summary of speedup in seconds per epoch of training:

Model Original Implementations DGL-LifeSci Implementations Speedup
GCN on Tox21 5.5 (DeepChem) 1 5.5x
AttentiveFP on Aromaticity 6 1.2 5x
JTNN on ZINC 1826 743 2.5x
WLN for reaction center prediction 11657 5095 2.3x

To get started, check out the examples and documentations under https://github.com/dmlc/dgl/tree/master/apps/life_sci.

Experimenting new APIs for sampling

Sampling is crucial to training GNNs on giant graphs. In this release, we re-design the APIs for sampling, aiming for a more intuitive programming experience and a better performance at the same time. The new APIs have several advantages:

  • Support a wide range of sampling-based GNN models, including PinSAGE, GraphSAGE, Graph Convolutional Matrix Completion (GCMC), and etc.
  • Support customization in Python.
  • Support heterogeneous graphs.
  • Leverage all pre-defined NN modules with no code change.
  • Utilize both multi-processing and multi-threading for maximum speed.

The code below defines a basic neighbor sampler:

class NeighborSampler(object):
    def __init__(self, g, fanouts):
        self.g = g  # The full graph structure
        self.fanouts = fanouts  # fan-out of each layer

    def sample_blocks(self, seeds):
        # `seeds` are the set of nodes to build one sample from.
        blocks = []
        for fanout in self.fanouts:
            # For each seed node, sample ``fanout`` neighbors.
            frontier = dgl.sampling.sample_neighbors(g, seeds, fanout, replace=True)
            # Then we compact the frontier into a bipartite graph for message passing.
            block = dgl.to_block(frontier, seeds)
            # Obtain the seed nodes for next layer.
            seeds = block.srcdata[dgl.NID]

            blocks.insert(0, block)
        return blocks

Although these APIs are still experimental, you can find their usages in many examples:

  • Train the GraphSAGE model by neighbor sampling and scale it to multiple GPUs (link).
  • Train the Relational GCN model on heterogeneous graphs by sampling for both node classification and link prediction (link).
  • Train the PinSAGE model by random walk sampling for item recommendation (link).
  • Train the GCMC model by sampling for MovieLens rating prediction (link).
  • Implement the variance reduction technique for neighbor sampling (link) proposed by Chen et al.

We will continue polishing these APIs, and the corresponding documentations and tutorials are coming.

Other Improvements

  • All GNN modules under dgl.nn now support both homogeneous graph and bipartite graph.
  • DGLHeteroGraph now has a faster pickling/unpickling implementation.
  • Add new APIs for saving and loading DGLHeteroGraph from checkpoints.
  • BatchedDGLGraph and DGLSubGraph classes have been merged to DGLGraph.
  • Constructing DGLGraph no longer requires a multigraph flag.

More details can be found in the full release note.