What is new in DGL v0.4.3 release?
The DGL v0.4.3 release brings many new features for an enhanced usability and system efficiency. The article takes a peek at some of the major highlights.
TensorFlow support
DGL finally comes to the TensorFlow community starting from this release.
Switching to TensorFlow is easy. If you are a first-time user, please install
DGL and import dgl
, and then follow the instructions to set the default
backend. You can always switch back by changing the config.json
file, which is
under ~/.dgl
folder. DGL keeps a coherent user experience regardless of which
backend is currently in use. The following code demonstrates the basic steps to
apply a graph convolution layer.
import tensorflow as tf
import dgl.nn as dglnn
# Random features for 10 nodes; each is of length 5.
x = tf.random.normal((10, 5))
# Random graph; 10 nodes and 20 edges.
g = dgl.rand_graph(10, 20)
# Pre-defined graph convolution module.
conv = dglnn.GraphConv(5, 3)
y = conv(g, x) # Apply the graph convolution layer.
We have implemented and released 15 common GNN modules in TensorFlow (more are coming), all of which can be invoked in one line of codes.
GraphConv
from the Graph Convolutional Networks paper.GATConv
from the Graph Attention Networks paper.SAGEConv
from the Inductive Representation Learning on Large Graphs paper (a.k.a. GraphSAGE).GINConv
from the How Powerful are Graph Neural Networks paper.RelGraphConv
from the Modeling Relational Data with Graph Convolutional Networks paper.SGConv
from the Simplifying Graph Convolutional Networks paper.APPNPConv
from the Predict then Propagate: Graph Neural Networks meet Personalized PageRank paper.- An
edge_softmax
function for computing softmax over the neighboring edges of each vertex. - Various pooling layers:
SumPooling
,AvgPooling
,MaxPooling
,SortPooling
,WeightAndSum
, andGlobalAttentionPooling
. - A
HeteroGraphConv
module for applying GNN modules to heterogeneous graphs.
Our preliminary benchmark shows strong performance improvement against other TF-based tools for GNNs in terms of both training speed (measured by epoch running time in seconds) and memory consumption.
Dateset | Model | DGL | GraphNet | tf_geometric |
---|---|---|---|---|
Core | GCN | 0.0148 | 0.0152 | 0.0192 |
GCN | 0.1095 | OOM | OOM | |
PubMed | GCN | 0.0156 | 0.0553 | 0.0185 |
PPI | GCN | 0.09 | 0.16 | 0.21 |
Cora | GAT | 0.0442 | n/a | 0.058 |
PPI | GAT | 0.398 | n/a | 0.752 |
To get started, install DGL and check out the examples here.
DGL-KE: A light-speed package for learning knowledge graph embeddings
Previously incubated under the DGL main repository, DGL-KE now officially announces its 0.1 release as a standalone package. The key highlights are:
- Effortlessly generate knowledge graph embedding with one line of code.
- Support for giant graphs with millions of nodes and edges.
- Distributed training with highly-optimized graph partitioning, negative sampling and communication, which can be deployed on both multi-GPU machines and multi-machine clusters.
DGL-KE can be installed with pip:
pip install dglke
The following command trains embeddings of the full FreeBase graph (over 86M nodes and 338M edges) with 8 GPUs.
dglke_train --model TransE_l2 --dataset Freebase --batch_size 1000 \
--neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 \
--regularization_coef 1e-9 -adv --gpu 0 1 2 3 4 5 6 7 \
--max_step 320000 --log_interval 10000 --async_update \
--rel_part --force_sync_interval 10000
DGL-KE is designed for learning at scale and speed. Our benchmark on the full FreeBase graph shows that DGL-KE can train embeddings under 100 minutes on an 8-GPU machine and under 30 minutes on a 4-machine cluster (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.
![]() |
![]() |
Check out our new GitHub repository, examples and documentations under https://github.com/awslabs/dgl-ke
DGL-LifeSci: Bringing Graph Neural Networks to Chemistry and Biology
Previously incubated as a model zoo for chemistry, DGL-LifeSci is now spun off as a standalone package. The key highlights are:
- Training scripts and pre-trained models for various applications — molecular property prediction, generative models, and reaction prediction.
- Up to 5.5x model training speedup compared to previous implementations.
- Well defined pipelines for data processing, model construction and evaluation.
DGL-LifeSci can be installed with pip or conda.
pip install dgllife
conda install -c dglteam dgllife
A summary of speedup in seconds per epoch of training:
Model | Original Implementations | DGL-LifeSci Implementations | Speedup |
---|---|---|---|
GCN on Tox21 | 5.5 (DeepChem) | 1 | 5.5x |
AttentiveFP on Aromaticity | 6 | 1.2 | 5x |
JTNN on ZINC | 1826 | 743 | 2.5x |
WLN for reaction center prediction | 11657 | 5095 | 2.3x |
To get started, check out the examples and documentations under https://github.com/awslabs/dgl-lifesci.
Experimenting new APIs for sampling
Sampling is crucial to training GNNs on giant graphs. In this release, we re-design the APIs for sampling, aiming for a more intuitive programming experience and a better performance at the same time. The new APIs have several advantages:
- Support a wide range of sampling-based GNN models, including PinSAGE, GraphSAGE, Graph Convolutional Matrix Completion (GCMC), and etc.
- Support customization in Python.
- Support heterogeneous graphs.
- Leverage all pre-defined NN modules with no code change.
- Utilize both multi-processing and multi-threading for maximum speed.
The code below defines a basic neighbor sampler:
class NeighborSampler(object):
def __init__(self, g, fanouts):
self.g = g # The full graph structure
self.fanouts = fanouts # fan-out of each layer
def sample_blocks(self, seeds):
# `seeds` are the set of nodes to build one sample from.
blocks = []
for fanout in self.fanouts:
# For each seed node, sample ``fanout`` neighbors.
frontier = dgl.sampling.sample_neighbors(g, seeds, fanout, replace=True)
# Then we compact the frontier into a bipartite graph for message passing.
block = dgl.to_block(frontier, seeds)
# Obtain the seed nodes for next layer.
seeds = block.srcdata[dgl.NID]
blocks.insert(0, block)
return blocks
Although these APIs are still experimental, you can find their usages in many examples:
- Train the GraphSAGE model by neighbor sampling and scale it to multiple GPUs (link).
- Train the Relational GCN model on heterogeneous graphs by sampling for both node classification and link prediction (link).
- Train the PinSAGE model by random walk sampling for item recommendation (link).
- Train the GCMC model by sampling for MovieLens rating prediction (link).
- Implement the variance reduction technique for neighbor sampling (link) proposed by Chen et al.
We will continue polishing these APIs, and the corresponding documentations and tutorials are coming.
Other Improvements
- All GNN modules under
dgl.nn
now support both homogeneous graph and bipartite graph. DGLHeteroGraph
now has a faster pickling/unpickling implementation.- Add new APIs for saving and loading
DGLHeteroGraph
from checkpoints. BatchedDGLGraph
andDGLSubGraph
classes have been merged toDGLGraph
.- Constructing
DGLGraph
no longer requires amultigraph
flag.
More details can be found in the full release note.
01 April