Blog Details

Home / Blog Details
blog

DGL v0.4 Release (heterogeneous graph update)

We are thrilled to announce the 0.4 release! This includes:

Heterogeneous Graph Support

What is a heterogeneous graph?

A heterogeneous graph is a graph whose nodes and edges are typed:

Models that work on heterogeneous graphs?

Models using Heterogeneous Graph API:

  • GCMC [Code in MXNet]

    Dataset RMSE (DGL) RMSE (Official) Speed (DGL) Speed (Official) Speed Comparison
    MovieLens-100K 0.9077 0.910 0.0246s/epoch 0.1008s/epoch 5x
    MovieLens-1M 0.8377 0.832 0.0695s/epoch 1.538s/epoch 22x
    MovieLens-10M (full-graph training) 0.7875 0.777 0.6480s/epoch OOM -
  • R-GCN [Code in PyTorch]
    • We provide an R-GCN model for heterograph input. The new code can train the model for the AM dataset (>5M edges) using one GPU, while the original implementation can only run on CPU and consume 32GB memory.
    • The original implementation takes 51.88s to train one epoch on CPU. The new R-GCN based on heterograph takes only 0.1781s for one epoch on V100 GPU (291x faster !!).
  • Heterogeneous Attention Networks [Code in PyTorch]
  • Metapath2vec [Code in PyTorch]
    • The metapath sampler is twice as fast as the original implementation.

How could I play with a heterogeneous graph?

Here is an example for creating and manipulating a heterogeneous graph:

import dgl
import torch
import dgl.function as fn

g = dgl.heterograph({
    ('user', 'follows', 'user'): [(0, 1), (1, 2)],
    ('user', 'plays', 'game'): [(0, 0), (1, 0), (1, 1), (2, 1)],
    ('game', 'attracts', 'user'): [(0, 0), (0, 1), (1, 1), (1, 2)],
    ('developer', 'develops', 'game'): [(0, 0), (1, 1)],
    })

# Here the user nodes have a single feature named x, and game nodes have a single feature named y
x = torch.randn(3, 5)
y = torch.randn(2, 4)
g.nodes['user'].data['x'] = x
g.nodes['game'].data['y'] = y

# Edge features are similar
a = torch.randn(2, 5)
b = torch.randn(4, 7)
g.edges['follows'].data['a'] = a
g.edges['plays'].data['b'] = b

# One can also perform message passing.
# The following code performs a full message passing on the "plays" edges.
g['follows'].update_all(fn.copy_u('x', 'm'), fn.sum('m', 'z'))
z = g.nodes['game'].data['z']
assert torch.allclose(z[0], x[0])
assert torch.allclose(z[1], x[0] + x[1])
assert torch.allclose(z[2], x[1])

# Moreover, one can also perform message passing on multiple types at the same time, aggregating the results
g.multi_update_all({
    'follows': (fn.copy_u('x', 'm'), fn.sum('m', 'w')),
    'attracts': (fn.copy_u('a', 'm'), fn.sum('m', 'w')),
    }, 'sum')

Checkout our heterograph tutorial: Working with Heterogeneous Graphs in DGL

Checkout the full API reference.

Knowledge Graph Models

We also released DGL-KE, a subpackage of DGL that trains embeddings on knowledge graphs. This package is adapted from the KnowledgeGraphEmbedding package. We made it fast and scalable while still maintaining the flexibility of the original package. Using a single NVIDIA V100 GPU, DGL-KE can train TransE on FB15k in 6.85 mins, substantially outperforming existing tools such as GraphVite. For graphs with hundreds of millions of edges (such as the full Freebase graph), it takes a couple of hours on one EC2 x1.32xlarge machine.

Currently, the following models are supported:

  • TransE
  • DistMult
  • ComplEx

And the following training schemas are supported:

  • CPU training
  • GPU training
  • Joint CPU & GPU training
  • Multiprocessing training on CPUs

Training results on FB15k using one NVIDIA V100 GPU

Training Speed:

Models TransE DistMult ComplEx
MAX_STEPS 20000 100000 100000
TIME 411s 690s 806s

Training accuracy:

Models MR MRR HITS@1 HITS@3 HITS@10
TransE 69.12 0.656 0.567 0.718 0.802
DistMult 43.35 0.783 0.713 0.837 0.897
ComplEx 51.99 0.785 0.720 0.832 0.889

In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k 2x times faster than GraphVite while using much fewer resources.

For more information, please refer to this directory

Miscellaneous

  • New builtin message function: dot product (u_dot_v etc. #831 @classicsong )
  • More efficient data format and serialization (#728 @VoVAllen )
  • ClusterGCN (#877 , @Zardinality )
  • CoraFull, Amazon, KarateClub, Coauthor datasets (#855 @VoVAllen )
  • More performance improvements
  • More bugfixes