#### What is new in DGL v0.4.3 release?

The DGL v0.4.3 release brings many new features for an enhanced usability and system efficiency. The article takes a peek at some of the major highlights.

## TensorFlow support

DGL finally comes to the TensorFlow community starting from this release.
Switching to TensorFlow is easy. If you are a first-time user, please install
DGL and `import dgl`

, and then follow the instructions to set the default
backend. You can always switch back by changing the `config.json`

file, which is
under `~/.dgl`

folder. DGL keeps a coherent user experience regardless of which
backend is currently in use. The following code demonstrates the basic steps to
apply a graph convolution layer.

```
import tensorflow as tf
import dgl.nn as dglnn
# Random features for 10 nodes; each is of length 5.
x = tf.random.normal((10, 5))
# Random graph; 10 nodes and 20 edges.
g = dgl.rand_graph(10, 20)
# Pre-defined graph convolution module.
conv = dglnn.GraphConv(5, 3)
y = conv(g, x) # Apply the graph convolution layer.
```

We have implemented and released 15 common GNN modules in TensorFlow (more are coming), all of which can be invoked in one line of codes.

`GraphConv`

from the Graph Convolutional Networks paper.`GATConv`

from the Graph Attention Networks paper.`SAGEConv`

from the Inductive Representation Learning on Large Graphs paper (a.k.a. GraphSAGE).`GINConv`

from the How Powerful are Graph Neural Networks paper.`RelGraphConv`

from the Modeling Relational Data with Graph Convolutional Networks paper.`SGConv`

from the Simplifying Graph Convolutional Networks paper.`APPNPConv`

from the Predict then Propagate: Graph Neural Networks meet Personalized PageRank paper.- An
`edge_softmax`

function for computing softmax over the neighboring edges of each vertex. - Various pooling layers:
`SumPooling`

,`AvgPooling`

,`MaxPooling`

,`SortPooling`

,`WeightAndSum`

, and`GlobalAttentionPooling`

. - A
`HeteroGraphConv`

module for applying GNN modules to heterogeneous graphs.

Our preliminary benchmark shows strong performance improvement against other TF-based tools for GNNs in terms of both training speed (measured by epoch running time in seconds) and memory consumption.

Dateset | Model | DGL | GraphNet | tf_geometric |
---|---|---|---|---|

Core | GCN | 0.0148 | 0.0152 | 0.0192 |

GCN | 0.1095 | OOM | OOM | |

PubMed | GCN | 0.0156 | 0.0553 | 0.0185 |

PPI | GCN | 0.09 | 0.16 | 0.21 |

Cora | GAT | 0.0442 | n/a | 0.058 |

PPI | GAT | 0.398 | n/a | 0.752 |

To get started, install DGL and check out the examples here.

## DGL-KE: A light-speed package for learning knowledge graph embeddings

Previously incubated under the DGL main repository, DGL-KE now officially announces its 0.1 release as a standalone package. The key highlights are:

- Effortlessly generate knowledge graph embedding with one line of code.
- Support for giant graphs with millions of nodes and edges.
- Distributed training with highly-optimized graph partitioning, negative sampling and communication, which can be deployed on both multi-GPU machines and multi-machine clusters.

DGL-KE can be installed with pip:

```
pip install dglke
```

The following command trains embeddings of the full FreeBase graph (over 86M nodes and 338M edges) with 8 GPUs.

```
dglke_train --model TransE_l2 --dataset Freebase --batch_size 1000 \
--neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 \
--regularization_coef 1e-9 -adv --gpu 0 1 2 3 4 5 6 7 \
--max_step 320000 --log_interval 10000 --async_update \
--rel_part --force_sync_interval 10000
```

DGL-KE is designed for learning at scale and speed. Our benchmark on the full FreeBase graph shows that DGL-KE can train embeddings under 100 minutes on an 8-GPU machine and under 30 minutes on a 4-machine cluster (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.

Check out our new GitHub repository, examples and documentations under https://github.com/awslabs/dgl-ke

## DGL-LifeSci: Bringing Graph Neural Networks to Chemistry and Biology

Previously incubated as a model zoo for chemistry, DGL-LifeSci is now spun off as a standalone package. The key highlights are:

- Training scripts and pre-trained models for various applications — molecular property prediction, generative models, and reaction prediction.
- Up to 5.5x model training speedup compared to previous implementations.
- Well defined pipelines for data processing, model construction and evaluation.

DGL-LifeSci can be installed with pip or conda.

```
pip install dgllife
conda install -c dglteam dgllife
```

A summary of speedup in seconds per epoch of training:

Model | Original Implementations | DGL-LifeSci Implementations | Speedup |
---|---|---|---|

GCN on Tox21 | 5.5 (DeepChem) | 1 | 5.5x |

AttentiveFP on Aromaticity | 6 | 1.2 | 5x |

JTNN on ZINC | 1826 | 743 | 2.5x |

WLN for reaction center prediction | 11657 | 5095 | 2.3x |

To get started, check out the examples and documentations under https://github.com/dmlc/dgl/tree/master/apps/life_sci.

## Experimenting new APIs for sampling

Sampling is crucial to training GNNs on giant graphs. In this release, we re-design the APIs for sampling, aiming for a more intuitive programming experience and a better performance at the same time. The new APIs have several advantages:

- Support a wide range of sampling-based GNN models, including PinSAGE, GraphSAGE, Graph Convolutional Matrix Completion (GCMC), and etc.
- Support customization in Python.
- Support heterogeneous graphs.
- Leverage all pre-defined NN modules with no code change.
- Utilize both multi-processing and multi-threading for maximum speed.

The code below defines a basic neighbor sampler:

```
class NeighborSampler(object):
def __init__(self, g, fanouts):
self.g = g # The full graph structure
self.fanouts = fanouts # fan-out of each layer
def sample_blocks(self, seeds):
# `seeds` are the set of nodes to build one sample from.
blocks = []
for fanout in self.fanouts:
# For each seed node, sample ``fanout`` neighbors.
frontier = dgl.sampling.sample_neighbors(g, seeds, fanout, replace=True)
# Then we compact the frontier into a bipartite graph for message passing.
block = dgl.to_block(frontier, seeds)
# Obtain the seed nodes for next layer.
seeds = block.srcdata[dgl.NID]
blocks.insert(0, block)
return blocks
```

Although these APIs are still experimental, you can find their usages in many examples:

- Train the GraphSAGE model by neighbor sampling and scale it to multiple GPUs (link).
- Train the Relational GCN model on heterogeneous graphs by sampling for both node classification and link prediction (link).
- Train the PinSAGE model by random walk sampling for item recommendation (link).
- Train the GCMC model by sampling for MovieLens rating prediction (link).
- Implement the variance reduction technique for neighbor sampling (link) proposed by Chen et al.

We will continue polishing these APIs, and the corresponding documentations and tutorials are coming.

## Other Improvements

- All GNN modules under
`dgl.nn`

now support both homogeneous graph and bipartite graph. `DGLHeteroGraph`

now has a faster pickling/unpickling implementation.- Add new APIs for saving and loading
`DGLHeteroGraph`

from checkpoints. `BatchedDGLGraph`

and`DGLSubGraph`

classes have been merged to`DGLGraph`

.- Constructing
`DGLGraph`

no longer requires a`multigraph`

flag.

More details can be found in the full release note.

01 April