DGL v0.4 Release (heterogeneous graph update)
We are thrilled to announce the 0.4 release! This includes:
Heterogeneous Graph Support
What is a heterogeneous graph?
A heterogeneous graph is a graph whose nodes and edges are typed:
Models that work on heterogeneous graphs?
Models using Heterogeneous Graph API:
-
Dataset RMSE (DGL) RMSE (Official) Speed (DGL) Speed (Official) Speed Comparison MovieLens-100K 0.9077 0.910 0.0246s/epoch 0.1008s/epoch 5x MovieLens-1M 0.8377 0.832 0.0695s/epoch 1.538s/epoch 22x MovieLens-10M (full-graph training) 0.7875 0.777 0.6480s/epoch OOM - - R-GCN [Code in PyTorch]
- We provide an R-GCN model for heterograph input. The new code can train the model for the AM dataset (>5M edges) using one GPU, while the original implementation can only run on CPU and consume 32GB memory.
- The original implementation takes 51.88s to train one epoch on CPU. The new R-GCN based on heterograph takes only 0.1781s for one epoch on V100 GPU (291x faster !!).
- Heterogeneous Attention Networks [Code in PyTorch]
- Metapath2vec [Code in PyTorch]
- The metapath sampler is twice as fast as the original implementation.
How could I play with a heterogeneous graph?
Here is an example for creating and manipulating a heterogeneous graph:
import dgl
import torch
import dgl.function as fn
g = dgl.heterograph({
('user', 'follows', 'user'): [(0, 1), (1, 2)],
('user', 'plays', 'game'): [(0, 0), (1, 0), (1, 1), (2, 1)],
('game', 'attracts', 'user'): [(0, 0), (0, 1), (1, 1), (1, 2)],
('developer', 'develops', 'game'): [(0, 0), (1, 1)],
})
# Here the user nodes have a single feature named x, and game nodes have a single feature named y
x = torch.randn(3, 5)
y = torch.randn(2, 4)
g.nodes['user'].data['x'] = x
g.nodes['game'].data['y'] = y
# Edge features are similar
a = torch.randn(2, 5)
b = torch.randn(4, 7)
g.edges['follows'].data['a'] = a
g.edges['plays'].data['b'] = b
# One can also perform message passing.
# The following code performs a full message passing on the "plays" edges.
g['follows'].update_all(fn.copy_u('x', 'm'), fn.sum('m', 'z'))
z = g.nodes['game'].data['z']
assert torch.allclose(z[0], x[0])
assert torch.allclose(z[1], x[0] + x[1])
assert torch.allclose(z[2], x[1])
# Moreover, one can also perform message passing on multiple types at the same time, aggregating the results
g.multi_update_all({
'follows': (fn.copy_u('x', 'm'), fn.sum('m', 'w')),
'attracts': (fn.copy_u('a', 'm'), fn.sum('m', 'w')),
}, 'sum')
Checkout our heterograph tutorial: Working with Heterogeneous Graphs in DGL
Checkout the full API reference.
Knowledge Graph Models
We also released DGL-KE, a subpackage of DGL that trains embeddings on knowledge graphs. This package is adapted from the KnowledgeGraphEmbedding package. We made it fast and scalable while still maintaining the flexibility of the original package. Using a single NVIDIA V100 GPU, DGL-KE can train TransE on FB15k in 6.85 mins, substantially outperforming existing tools such as GraphVite. For graphs with hundreds of millions of edges (such as the full Freebase graph), it takes a couple of hours on one EC2 x1.32xlarge machine.
Currently, the following models are supported:
- TransE
- DistMult
- ComplEx
And the following training schemas are supported:
- CPU training
- GPU training
- Joint CPU & GPU training
- Multiprocessing training on CPUs
Training results on FB15k using one NVIDIA V100 GPU
Training Speed:
Models | TransE | DistMult | ComplEx |
---|---|---|---|
MAX_STEPS | 20000 | 100000 | 100000 |
TIME | 411s | 690s | 806s |
Training accuracy:
Models | MR | MRR | HITS@1 | HITS@3 | HITS@10 |
---|---|---|---|---|---|
TransE | 69.12 | 0.656 | 0.567 | 0.718 | 0.802 |
DistMult | 43.35 | 0.783 | 0.713 | 0.837 | 0.897 |
ComplEx | 51.99 | 0.785 | 0.720 | 0.832 | 0.889 |
In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k 2x times faster than GraphVite while using much fewer resources.
For more information, please refer to this directory
Miscellaneous
- New builtin message function: dot product (
u_dot_v
etc. #831 @classicsong ) - More efficient data format and serialization (#728 @VoVAllen )
- ClusterGCN (#877 , @Zardinality )
- CoraFull, Amazon, KarateClub, Coauthor datasets (#855 @VoVAllen )
- More performance improvements
- More bugfixes
08 October