What is new in DGL v0.4.3 release?
The DGL v0.4.3 release brings many new features for an enhanced usability and system efficiency. The article takes a peek at some of the major highlights.
DGL finally comes to the TensorFlow community starting from this release.
Switching to TensorFlow is easy. If you are a first-time user, please install
import dgl, and then follow the instructions to set the default
backend. You can always switch back by changing the
config.json file, which is
~/.dgl folder. DGL keeps a coherent user experience regardless of which
backend is currently in use. The following code demonstrates the basic steps to
apply a graph convolution layer.
import tensorflow as tf import dgl.nn as dglnn # Random features for 10 nodes; each is of length 5. x = tf.random.normal((10, 5)) # Random graph; 10 nodes and 20 edges. g = dgl.rand_graph(10, 20) # Pre-defined graph convolution module. conv = dglnn.GraphConv(5, 3) y = conv(g, x) # Apply the graph convolution layer.
We have implemented and released 15 common GNN modules in TensorFlow (more are coming), all of which can be invoked in one line of codes.
GraphConvfrom the Graph Convolutional Networks paper.
GATConvfrom the Graph Attention Networks paper.
SAGEConvfrom the Inductive Representation Learning on Large Graphs paper (a.k.a. GraphSAGE).
GINConvfrom the How Powerful are Graph Neural Networks paper.
RelGraphConvfrom the Modeling Relational Data with Graph Convolutional Networks paper.
SGConvfrom the Simplifying Graph Convolutional Networks paper.
APPNPConvfrom the Predict then Propagate: Graph Neural Networks meet Personalized PageRank paper.
edge_softmaxfunction for computing softmax over the neighboring edges of each vertex.
- Various pooling layers:
HeteroGraphConvmodule for applying GNN modules to heterogeneous graphs.
Our preliminary benchmark shows strong performance improvement against other TF-based tools for GNNs in terms of both training speed (measured by epoch running time in seconds) and memory consumption.
DGL-KE: A light-speed package for learning knowledge graph embeddings
Previously incubated under the DGL main repository, DGL-KE now officially announces its 0.1 release as a standalone package. The key highlights are:
- Effortlessly generate knowledge graph embedding with one line of code.
- Support for giant graphs with millions of nodes and edges.
- Distributed training with highly-optimized graph partitioning, negative sampling and communication, which can be deployed on both multi-GPU machines and multi-machine clusters.
DGL-KE can be installed with pip:
pip install dglke
The following command trains embeddings of the full FreeBase graph (over 86M nodes and 338M edges) with 8 GPUs.
dglke_train --model TransE_l2 --dataset Freebase --batch_size 1000 \ --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 \ --regularization_coef 1e-9 -adv --gpu 0 1 2 3 4 5 6 7 \ --max_step 320000 --log_interval 10000 --async_update \ --rel_part --force_sync_interval 10000
DGL-KE is designed for learning at scale and speed. Our benchmark on the full FreeBase graph shows that DGL-KE can train embeddings under 100 minutes on an 8-GPU machine and under 30 minutes on a 4-machine cluster (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.
Check out our new GitHub repository, examples and documentations under https://github.com/awslabs/dgl-ke
DGL-LifeSci: Bringing Graph Neural Networks to Chemistry and Biology
Previously incubated as a model zoo for chemistry, DGL-LifeSci is now spun off as a standalone package. The key highlights are:
- Training scripts and pre-trained models for various applications — molecular property prediction, generative models, and reaction prediction.
- Up to 5.5x model training speedup compared to previous implementations.
- Well defined pipelines for data processing, model construction and evaluation.
DGL-LifeSci can be installed with pip or conda.
pip install dgllife conda install -c dglteam dgllife
A summary of speedup in seconds per epoch of training:
|Model||Original Implementations||DGL-LifeSci Implementations||Speedup|
|GCN on Tox21||5.5 (DeepChem)||1||5.5x|
|AttentiveFP on Aromaticity||6||1.2||5x|
|JTNN on ZINC||1826||743||2.5x|
|WLN for reaction center prediction||11657||5095||2.3x|
To get started, check out the examples and documentations under https://github.com/awslabs/dgl-lifesci.
Experimenting new APIs for sampling
Sampling is crucial to training GNNs on giant graphs. In this release, we re-design the APIs for sampling, aiming for a more intuitive programming experience and a better performance at the same time. The new APIs have several advantages:
- Support a wide range of sampling-based GNN models, including PinSAGE, GraphSAGE, Graph Convolutional Matrix Completion (GCMC), and etc.
- Support customization in Python.
- Support heterogeneous graphs.
- Leverage all pre-defined NN modules with no code change.
- Utilize both multi-processing and multi-threading for maximum speed.
The code below defines a basic neighbor sampler:
class NeighborSampler(object): def __init__(self, g, fanouts): self.g = g # The full graph structure self.fanouts = fanouts # fan-out of each layer def sample_blocks(self, seeds): # `seeds` are the set of nodes to build one sample from. blocks =  for fanout in self.fanouts: # For each seed node, sample ``fanout`` neighbors. frontier = dgl.sampling.sample_neighbors(g, seeds, fanout, replace=True) # Then we compact the frontier into a bipartite graph for message passing. block = dgl.to_block(frontier, seeds) # Obtain the seed nodes for next layer. seeds = block.srcdata[dgl.NID] blocks.insert(0, block) return blocks
Although these APIs are still experimental, you can find their usages in many examples:
- Train the GraphSAGE model by neighbor sampling and scale it to multiple GPUs (link).
- Train the Relational GCN model on heterogeneous graphs by sampling for both node classification and link prediction (link).
- Train the PinSAGE model by random walk sampling for item recommendation (link).
- Train the GCMC model by sampling for MovieLens rating prediction (link).
- Implement the variance reduction technique for neighbor sampling (link) proposed by Chen et al.
We will continue polishing these APIs, and the corresponding documentations and tutorials are coming.
- All GNN modules under
dgl.nnnow support both homogeneous graph and bipartite graph.
DGLHeteroGraphnow has a faster pickling/unpickling implementation.
- Add new APIs for saving and loading
DGLSubGraphclasses have been merged to
DGLGraphno longer requires a
More details can be found in the full release note.