.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/models/1_gnn/4_rgcn.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_models_1_gnn_4_rgcn.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_models_1_gnn_4_rgcn.py:


.. _model-rgcn:

Relational Graph Convolutional Network
================================================

**Author:** Lingfan Yu, Mufei Li, Zheng Zhang

.. warning::

    The tutorial aims at gaining insights into the paper, with code as a mean
    of explanation. The implementation thus is NOT optimized for running
    efficiency. For recommended implementation, please refer to the `official
    examples <https://github.com/dmlc/dgl/tree/master/examples>`_.

In this tutorial, you learn how to implement a relational graph convolutional
network (R-GCN). This type of network is one effort to generalize GCN 
to handle different relationships between entities in a knowledge base. To 
learn more about the research behind R-GCN, see `Modeling Relational Data with Graph Convolutional
Networks <https://arxiv.org/pdf/1703.06103.pdf>`_ 

The straightforward graph convolutional network (GCN) exploits
structural information of a dataset (that is, the graph connectivity) in order to
improve the extraction of node representations. Graph edges are left as
untyped.

A knowledge graph is made up of a collection of triples in the form
subject, relation, object. Edges thus encode important information and
have their own embeddings to be learned. Furthermore, there may exist
multiple edges among any given pair.

.. GENERATED FROM PYTHON SOURCE LINES 34-138

A brief introduction to R-GCN
---------------------------
In *statistical relational learning* (SRL), there are two fundamental
tasks:

- **Entity classification** - Where you assign types and categorical
  properties to entities.
- **Link prediction** - Where you recover missing triples.

In both cases, missing information is expected to be recovered from the
neighborhood structure of the graph. For example, the R-GCN
paper cited earlier provides the following example. Knowing that Mikhail Baryshnikov was educated at the Vaganova Academy
implies both that Mikhail Baryshnikov should have the label person, and
that the triple (Mikhail Baryshnikov, lived in, Russia) must belong to the
knowledge graph.

R-GCN solves these two problems using a common graph convolutional network. It's
extended with multi-edge encoding to compute embedding of the entities, but
with different downstream processing.

- Entity classification is done by attaching a softmax classifier at the
  final embedding of an entity (node). Training is through loss of standard
  cross-entropy.
- Link prediction is done by reconstructing an edge with an autoencoder
  architecture, using a parameterized score function. Training uses negative
  sampling.

This tutorial focuses on the first task, entity classification, to show how to generate entity
representation. `Complete
code <https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn>`_
for both tasks is found in the DGL Github repository.

Key ideas of R-GCN
-------------------
Recall that in GCN, the hidden representation for each node :math:`i` at
:math:`(l+1)^{th}` layer is computed by:

.. math:: h_i^{l+1} = \sigma\left(\sum_{j\in N_i}\frac{1}{c_i} W^{(l)} h_j^{(l)}\right)~~~~~~~~~~(1)\\

where :math:`c_i` is a normalization constant.

The key difference between R-GCN and GCN is that in R-GCN, edges can
represent different relations. In GCN, weight :math:`W^{(l)}` in equation
:math:`(1)` is shared by all edges in layer :math:`l`. In contrast, in
R-GCN, different edge types use different weights and only edges of the
same relation type :math:`r` are associated with the same projection weight
:math:`W_r^{(l)}`.

So the hidden representation of entities in :math:`(l+1)^{th}` layer in
R-GCN can be formulated as the following equation:

.. math:: h_i^{l+1} = \sigma\left(W_0^{(l)}h_i^{(l)}+\sum_{r\in R}\sum_{j\in N_i^r}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}\right)~~~~~~~~~~(2)\\

where :math:`N_i^r` denotes the set of neighbor indices of node :math:`i`
under relation :math:`r\in R` and :math:`c_{i,r}` is a normalization
constant. In entity classification, the R-GCN paper uses
:math:`c_{i,r}=|N_i^r|`.

The problem of applying the above equation directly is the rapid growth of
the number of parameters, especially with highly multi-relational data. In
order to reduce model parameter size and prevent overfitting, the original
paper proposes to use basis decomposition.

.. math:: W_r^{(l)}=\sum\limits_{b=1}^B a_{rb}^{(l)}V_b^{(l)}~~~~~~~~~~(3)\\

Therefore, the weight :math:`W_r^{(l)}` is a linear combination of basis
transformation :math:`V_b^{(l)}` with coefficients :math:`a_{rb}^{(l)}`.
The number of bases :math:`B` is much smaller than the number of relations
in the knowledge base.

.. note::
   Another weight regularization, block-decomposition, is implemented in
   the `link prediction <link-prediction_>`_.

Implement R-GCN in DGL
----------------------

An R-GCN model is composed of several R-GCN layers. The first R-GCN layer
also serves as input layer and takes in features (for example, description texts)
that are associated with node entity and project to hidden space. In this tutorial,
we only use the entity ID as an entity feature.

R-GCN layers
~~~~~~~~~~~~

For each node, an R-GCN layer performs the following steps:

- Compute outgoing message using node representation and weight matrix
  associated with the edge type (message function)
- Aggregate incoming messages and generate new node representations (reduce
  and apply function)

The following code is the definition of an R-GCN hidden layer.

.. note::
   Each relation type is associated with a different weight. Therefore,
   the full weight matrix has three dimensions: relation, input_feature,
   output_feature.

.. note::

   This is showing how to implement an R-GCN from scratch.  DGL provides a more
   efficient :class:`builtin R-GCN layer module <dgl.nn.pytorch.conv.RelGraphConv>`.


.. GENERATED FROM PYTHON SOURCE LINES 138-239

.. code-block:: Python


    import os

    os.environ["DGLBACKEND"] = "pytorch"
    from functools import partial

    import dgl
    import dgl.function as fn
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from dgl import DGLGraph


    class RGCNLayer(nn.Module):
        def __init__(
            self,
            in_feat,
            out_feat,
            num_rels,
            num_bases=-1,
            bias=None,
            activation=None,
            is_input_layer=False,
        ):
            super(RGCNLayer, self).__init__()
            self.in_feat = in_feat
            self.out_feat = out_feat
            self.num_rels = num_rels
            self.num_bases = num_bases
            self.bias = bias
            self.activation = activation
            self.is_input_layer = is_input_layer

            # sanity check
            if self.num_bases <= 0 or self.num_bases > self.num_rels:
                self.num_bases = self.num_rels
            # weight bases in equation (3)
            self.weight = nn.Parameter(
                torch.Tensor(self.num_bases, self.in_feat, self.out_feat)
            )
            if self.num_bases < self.num_rels:
                # linear combination coefficients in equation (3)
                self.w_comp = nn.Parameter(
                    torch.Tensor(self.num_rels, self.num_bases)
                )
            # add bias
            if self.bias:
                self.bias = nn.Parameter(torch.Tensor(out_feat))
            # init trainable parameters
            nn.init.xavier_uniform_(
                self.weight, gain=nn.init.calculate_gain("relu")
            )
            if self.num_bases < self.num_rels:
                nn.init.xavier_uniform_(
                    self.w_comp, gain=nn.init.calculate_gain("relu")
                )
            if self.bias:
                nn.init.xavier_uniform_(
                    self.bias, gain=nn.init.calculate_gain("relu")
                )

        def forward(self, g):
            if self.num_bases < self.num_rels:
                # generate all weights from bases (equation (3))
                weight = self.weight.view(
                    self.in_feat, self.num_bases, self.out_feat
                )
                weight = torch.matmul(self.w_comp, weight).view(
                    self.num_rels, self.in_feat, self.out_feat
                )
            else:
                weight = self.weight
            if self.is_input_layer:

                def message_func(edges):
                    # for input layer, matrix multiply can be converted to be
                    # an embedding lookup using source node id
                    embed = weight.view(-1, self.out_feat)
                    index = edges.data[dgl.ETYPE] * self.in_feat + edges.src["id"]
                    return {"msg": embed[index] * edges.data["norm"]}

            else:

                def message_func(edges):
                    w = weight[edges.data[dgl.ETYPE]]
                    msg = torch.bmm(edges.src["h"].unsqueeze(1), w).squeeze()
                    msg = msg * edges.data["norm"]
                    return {"msg": msg}

            def apply_func(nodes):
                h = nodes.data["h"]
                if self.bias:
                    h = h + self.bias
                if self.activation:
                    h = self.activation(h)
                return {"h": h}

            g.update_all(message_func, fn.sum(msg="msg", out="h"), apply_func)


.. GENERATED FROM PYTHON SOURCE LINES 240-242

Full R-GCN model defined
~~~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 242-322

.. code-block:: Python


    class Model(nn.Module):
        def __init__(
            self,
            num_nodes,
            h_dim,
            out_dim,
            num_rels,
            num_bases=-1,
            num_hidden_layers=1,
        ):
            super(Model, self).__init__()
            self.num_nodes = num_nodes
            self.h_dim = h_dim
            self.out_dim = out_dim
            self.num_rels = num_rels
            self.num_bases = num_bases
            self.num_hidden_layers = num_hidden_layers

            # create rgcn layers
            self.build_model()

            # create initial features
            self.features = self.create_features()

        def build_model(self):
            self.layers = nn.ModuleList()
            # input to hidden
            i2h = self.build_input_layer()
            self.layers.append(i2h)
            # hidden to hidden
            for _ in range(self.num_hidden_layers):
                h2h = self.build_hidden_layer()
                self.layers.append(h2h)
            # hidden to output
            h2o = self.build_output_layer()
            self.layers.append(h2o)

        # initialize feature for each node
        def create_features(self):
            features = torch.arange(self.num_nodes)
            return features

        def build_input_layer(self):
            return RGCNLayer(
                self.num_nodes,
                self.h_dim,
                self.num_rels,
                self.num_bases,
                activation=F.relu,
                is_input_layer=True,
            )

        def build_hidden_layer(self):
            return RGCNLayer(
                self.h_dim,
                self.h_dim,
                self.num_rels,
                self.num_bases,
                activation=F.relu,
            )

        def build_output_layer(self):
            return RGCNLayer(
                self.h_dim,
                self.out_dim,
                self.num_rels,
                self.num_bases,
                activation=partial(F.softmax, dim=1),
            )

        def forward(self, g):
            if self.features is not None:
                g.ndata["id"] = self.features
            for layer in self.layers:
                layer(g)
            return g.ndata.pop("h")


.. GENERATED FROM PYTHON SOURCE LINES 323-326

Handle dataset
~~~~~~~~~~~~~~~~
This tutorial uses Institute for Applied Informatics and Formal Description Methods (AIFB) dataset from R-GCN paper.

.. GENERATED FROM PYTHON SOURCE LINES 326-343

.. code-block:: Python


    # load graph data
    dataset = dgl.data.rdf.AIFBDataset()
    g = dataset[0]
    category = dataset.predict_category
    train_mask = g.nodes[category].data.pop("train_mask")
    test_mask = g.nodes[category].data.pop("test_mask")
    train_idx = torch.nonzero(train_mask, as_tuple=False).squeeze()
    test_idx = torch.nonzero(test_mask, as_tuple=False).squeeze()
    labels = g.nodes[category].data.pop("label")
    num_rels = len(g.canonical_etypes)
    num_classes = dataset.num_classes
    # normalization factor
    for cetype in g.canonical_etypes:
        g.edges[cetype].data["norm"] = dgl.norm_by_dst(g, cetype).unsqueeze(1)
    category_id = g.ntypes.index(category)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Downloading /root/.dgl/aifb-hetero.zip from https://data.dgl.ai/dataset/rdf/aifb-hetero.zip...
    /root/.dgl/aifb-hetero.zip:   0%|          | 0.00/344k [00:00<?, ?B/s]    /root/.dgl/aifb-hetero.zip: 100%|██████████| 344k/344k [00:00<00:00, 14.7MB/s]
    Extracting file to /root/.dgl/aifb-hetero_82d021d8
    Parsing file aifbfixed_complete.n3 ...
    Processed 0 tuples, found 0 valid tuples.
    Processed 10000 tuples, found 8406 valid tuples.
    Processed 20000 tuples, found 16622 valid tuples.
    Adding reverse edges ...
    Creating one whole graph ...
    Total #nodes: 7262
    Total #edges: 48810
    Convert to heterograph ...
    #Node types: 7
    #Canonical edge types: 104
    #Unique edge type names: 78
    Load training/validation/testing split ...
    Done saving data into cached files.


.. GENERATED FROM PYTHON SOURCE LINES 344-346

Create graph and model
~~~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 346-370

.. code-block:: Python


    # configurations
    n_hidden = 16  # number of hidden units
    n_bases = -1  # use number of relations as number of bases
    n_hidden_layers = 0  # use 1 input layer, 1 output layer, no hidden layer
    n_epochs = 25  # epochs to train
    lr = 0.01  # learning rate
    l2norm = 0  # L2 norm coefficient

    # create graph
    g = dgl.to_homogeneous(g, edata=["norm"])
    node_ids = torch.arange(g.num_nodes())
    target_idx = node_ids[g.ndata[dgl.NTYPE] == category_id]

    # create model
    model = Model(
        g.num_nodes(),
        n_hidden,
        num_classes,
        num_rels,
        num_bases=n_bases,
        num_hidden_layers=n_hidden_layers,
    )


.. GENERATED FROM PYTHON SOURCE LINES 371-373

Training loop
~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 373-402

.. code-block:: Python


    # optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2norm)

    print("start training...")
    model.train()
    for epoch in range(n_epochs):
        optimizer.zero_grad()
        logits = model.forward(g)
        logits = logits[target_idx]
        loss = F.cross_entropy(logits[train_idx], labels[train_idx])
        loss.backward()

        optimizer.step()

        train_acc = torch.sum(logits[train_idx].argmax(dim=1) == labels[train_idx])
        train_acc = train_acc.item() / len(train_idx)
        val_loss = F.cross_entropy(logits[test_idx], labels[test_idx])
        val_acc = torch.sum(logits[test_idx].argmax(dim=1) == labels[test_idx])
        val_acc = val_acc.item() / len(test_idx)
        print(
            "Epoch {:05d} | ".format(epoch)
            + "Train Accuracy: {:.4f} | Train Loss: {:.4f} | ".format(
                train_acc, loss.item()
            )
            + "Validation Accuracy: {:.4f} | Validation loss: {:.4f}".format(
                val_acc, val_loss.item()
            )
        )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    start training...
    Epoch 00000 | Train Accuracy: 0.2357 | Train Loss: 1.3862 | Validation Accuracy: 0.3333 | Validation loss: 1.3864
    Epoch 00001 | Train Accuracy: 0.9286 | Train Loss: 1.3552 | Validation Accuracy: 0.9444 | Validation loss: 1.3623
    Epoch 00002 | Train Accuracy: 0.9357 | Train Loss: 1.3076 | Validation Accuracy: 0.9444 | Validation loss: 1.3241
    Epoch 00003 | Train Accuracy: 0.9429 | Train Loss: 1.2439 | Validation Accuracy: 0.9444 | Validation loss: 1.2723
    Epoch 00004 | Train Accuracy: 0.9429 | Train Loss: 1.1725 | Validation Accuracy: 0.9444 | Validation loss: 1.2131
    Epoch 00005 | Train Accuracy: 0.9429 | Train Loss: 1.1046 | Validation Accuracy: 0.9444 | Validation loss: 1.1560
    Epoch 00006 | Train Accuracy: 0.9500 | Train Loss: 1.0454 | Validation Accuracy: 0.9444 | Validation loss: 1.1059
    Epoch 00007 | Train Accuracy: 0.9500 | Train Loss: 0.9946 | Validation Accuracy: 0.9444 | Validation loss: 1.0614
    Epoch 00008 | Train Accuracy: 0.9500 | Train Loss: 0.9517 | Validation Accuracy: 0.9444 | Validation loss: 1.0211
    Epoch 00009 | Train Accuracy: 0.9500 | Train Loss: 0.9164 | Validation Accuracy: 0.9722 | Validation loss: 0.9849
    Epoch 00010 | Train Accuracy: 0.9500 | Train Loss: 0.8883 | Validation Accuracy: 0.9722 | Validation loss: 0.9535
    Epoch 00011 | Train Accuracy: 0.9500 | Train Loss: 0.8665 | Validation Accuracy: 0.9722 | Validation loss: 0.9272
    Epoch 00012 | Train Accuracy: 0.9500 | Train Loss: 0.8498 | Validation Accuracy: 0.9722 | Validation loss: 0.9058
    Epoch 00013 | Train Accuracy: 0.9500 | Train Loss: 0.8373 | Validation Accuracy: 0.9444 | Validation loss: 0.8887
    Epoch 00014 | Train Accuracy: 0.9500 | Train Loss: 0.8281 | Validation Accuracy: 0.9444 | Validation loss: 0.8754
    Epoch 00015 | Train Accuracy: 0.9500 | Train Loss: 0.8214 | Validation Accuracy: 0.9167 | Validation loss: 0.8651
    Epoch 00016 | Train Accuracy: 0.9571 | Train Loss: 0.8166 | Validation Accuracy: 0.9167 | Validation loss: 0.8574
    Epoch 00017 | Train Accuracy: 0.9571 | Train Loss: 0.8131 | Validation Accuracy: 0.9167 | Validation loss: 0.8515
    Epoch 00018 | Train Accuracy: 0.9571 | Train Loss: 0.8104 | Validation Accuracy: 0.9167 | Validation loss: 0.8472
    Epoch 00019 | Train Accuracy: 0.9571 | Train Loss: 0.8081 | Validation Accuracy: 0.9167 | Validation loss: 0.8438
    Epoch 00020 | Train Accuracy: 0.9571 | Train Loss: 0.8060 | Validation Accuracy: 0.9167 | Validation loss: 0.8413
    Epoch 00021 | Train Accuracy: 0.9571 | Train Loss: 0.8041 | Validation Accuracy: 0.9167 | Validation loss: 0.8394
    Epoch 00022 | Train Accuracy: 0.9571 | Train Loss: 0.8022 | Validation Accuracy: 0.9167 | Validation loss: 0.8379
    Epoch 00023 | Train Accuracy: 0.9571 | Train Loss: 0.8004 | Validation Accuracy: 0.9444 | Validation loss: 0.8368
    Epoch 00024 | Train Accuracy: 0.9571 | Train Loss: 0.7984 | Validation Accuracy: 0.9444 | Validation loss: 0.8360


.. GENERATED FROM PYTHON SOURCE LINES 403-418

.. _link-prediction:

The second task, link prediction
--------------------------------
So far, you have seen how to use DGL to implement entity classification with an
R-GCN model. In the knowledge base setting, representation generated by
R-GCN can be used to uncover potential relationships between nodes. In the
R-GCN paper, the authors feed the entity representations generated by R-GCN
into the `DistMult <https://arxiv.org/pdf/1412.6575.pdf>`_ prediction model
to predict possible relationships.

The implementation is similar to that presented here, but with an extra DistMult layer
stacked on top of the R-GCN layers. You can find the complete
implementation of link prediction with R-GCN in our `Github Python code
example <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn/link.py>`_.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 3.805 seconds)


.. _sphx_glr_download_tutorials_models_1_gnn_4_rgcn.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 4_rgcn.ipynb <4_rgcn.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 4_rgcn.py <4_rgcn.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 4_rgcn.zip <4_rgcn.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_