.. _guide-minibatch-link-classification-sampler:

6.3 Training GNN for Link Prediction with Neighborhood Sampling
--------------------------------------------------------------------

:ref:`(中文版) <guide_cn-minibatch-link-classification-sampler>`

Define a neighborhood sampler and data loader with negative sampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can still use the same neighborhood sampler as the one in node/edge
classification.

.. code:: python

    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)

:func:`~dgl.dataloading.as_edge_prediction_sampler` in DGL also
supports generating negative samples for link prediction. To do so, you
need to provide the negative sampling function.
:class:`~dgl.dataloading.negative_sampler.Uniform` is a
function that does uniform sampling. For each source node of an edge, it
samples ``k`` negative destination nodes.

The following data loader will pick 5 negative destination nodes
uniformly for each source node of an edge.

.. code:: python

    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
    dataloader = dgl.dataloading.DataLoader(
        g, train_seeds, sampler,
        batch_size=args.batch_size,
        shuffle=True,
        drop_last=False,
        pin_memory=True,
        num_workers=args.num_workers)

For the builtin negative samplers please see :ref:`api-dataloading-negative-sampling`.

You can also give your own negative sampler function, as long as it
takes in the original graph ``g`` and the minibatch edge ID array
``eid``, and returns a pair of source ID arrays and destination ID
arrays.

The following gives an example of custom negative sampler that samples
negative destination nodes according to a probability distribution
proportional to a power of degrees.

.. code:: python

    class NegativeSampler(object):
        def __init__(self, g, k):
            # caches the probability distribution
            self.weights = g.in_degrees().float() ** 0.75
            self.k = k
    
        def __call__(self, g, eids):
            src, _ = g.find_edges(eids)
            src = src.repeat_interleave(self.k)
            dst = self.weights.multinomial(len(src), replacement=True)
            return src, dst
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, negative_sampler=NegativeSampler(g, 5))
    dataloader = dgl.dataloading.DataLoader(
        g, train_seeds, sampler,
        batch_size=args.batch_size,
        shuffle=True,
        drop_last=False,
        pin_memory=True,
        num_workers=args.num_workers)

Adapt your model for minibatch training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As explained in :ref:`guide-training-link-prediction`, link prediction is trained
via comparing the score of an edge (positive example) against a
non-existent edge (negative example). To compute the scores of edges you
can reuse the node representation computation model you have seen in
edge classification/regression.

.. code:: python

    class StochasticTwoLayerGCN(nn.Module):
        def __init__(self, in_features, hidden_features, out_features):
            super().__init__()
            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
    
        def forward(self, blocks, x):
            x = F.relu(self.conv1(blocks[0], x))
            x = F.relu(self.conv2(blocks[1], x))
            return x

For score prediction, since you only need to predict a scalar score for
each edge instead of a probability distribution, this example shows how
to compute a score with a dot product of incident node representations.

.. code:: python

    class ScorePredictor(nn.Module):
        def forward(self, edge_subgraph, x):
            with edge_subgraph.local_scope():
                edge_subgraph.ndata['x'] = x
                edge_subgraph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score'))
                return edge_subgraph.edata['score']

When a negative sampler is provided, DGL’s data loader will generate
three items per minibatch:

-  A positive graph containing all the edges sampled in the minibatch.

-  A negative graph containing all the non-existent edges generated by
   the negative sampler.

-  A list of *message flow graphs* (MFGs) generated by the neighborhood sampler.

So one can define the link prediction model as follows that takes in the
three items as well as the input features.

.. code:: python

    class Model(nn.Module):
        def __init__(self, in_features, hidden_features, out_features):
            super().__init__()
            self.gcn = StochasticTwoLayerGCN(
                in_features, hidden_features, out_features)
    
        def forward(self, positive_graph, negative_graph, blocks, x):
            x = self.gcn(blocks, x)
            pos_score = self.predictor(positive_graph, x)
            neg_score = self.predictor(negative_graph, x)
            return pos_score, neg_score

Training loop
~~~~~~~~~~~~~

The training loop simply involves iterating over the data loader and
feeding in the graphs as well as the input features to the model defined
above.

.. code:: python

    def compute_loss(pos_score, neg_score):
        # an example hinge loss
        n = pos_score.shape[0]
        return (neg_score.view(n, -1) - pos_score.view(n, -1) + 1).clamp(min=0).mean()

    model = Model(in_features, hidden_features, out_features)
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        positive_graph = positive_graph.to(torch.device('cuda'))
        negative_graph = negative_graph.to(torch.device('cuda'))
        input_features = blocks[0].srcdata['features']
        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
        loss = compute_loss(pos_score, neg_score)
        opt.zero_grad()
        loss.backward()
        opt.step()

DGL provides the
`unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_unsupervised.py>`__
that shows an example of link prediction on homogeneous graphs.

For heterogeneous graphs
~~~~~~~~~~~~~~~~~~~~~~~~
    
The models computing the node representations on heterogeneous graphs
can also be used for computing incident node representations for edge
classification/regression.

.. code:: python

    class StochasticTwoLayerRGCN(nn.Module):
        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
            super().__init__()
            self.conv1 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                    for rel in rel_names
                })
            self.conv2 = dglnn.HeteroGraphConv({
                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                    for rel in rel_names
                })
    
        def forward(self, blocks, x):
            x = self.conv1(blocks[0], x)
            x = self.conv2(blocks[1], x)
            return x

For score prediction, the only implementation difference between the
homogeneous graph and the heterogeneous graph is that we are looping
over the edge types for :meth:`dgl.DGLGraph.apply_edges`.

.. code:: python

    class ScorePredictor(nn.Module):
        def forward(self, edge_subgraph, x):
            with edge_subgraph.local_scope():
                edge_subgraph.ndata['x'] = x
                for etype in edge_subgraph.canonical_etypes:
                    edge_subgraph.apply_edges(
                        dgl.function.u_dot_v('x', 'x', 'score'), etype=etype)
                return edge_subgraph.edata['score']

    class Model(nn.Module):
        def __init__(self, in_features, hidden_features, out_features, num_classes,
                     etypes):
            super().__init__()
            self.rgcn = StochasticTwoLayerRGCN(
                in_features, hidden_features, out_features, etypes)
            self.pred = ScorePredictor()

        def forward(self, positive_graph, negative_graph, blocks, x):
            x = self.rgcn(blocks, x)
            pos_score = self.pred(positive_graph, x)
            neg_score = self.pred(negative_graph, x)
            return pos_score, neg_score

Data loader definition is also very similar to that of edge
classification/regression. The only difference is that you need to give
the negative sampler and you will be supplying a dictionary of edge
types and edge ID tensors instead of a dictionary of node types and node
ID tensors.

.. code:: python

    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, negative_sampler=dgl.dataloading.negative_sampler.Uniform(5))
    dataloader = dgl.dataloading.DataLoader(
        g, train_eid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

If you want to give your own negative sampling function, the function
should take in the original graph and the dictionary of edge types and
edge ID tensors. It should return a dictionary of edge types and
source-destination array pairs. An example is given as follows:

.. code:: python

   class NegativeSampler(object):
       def __init__(self, g, k):
           # caches the probability distribution
           self.weights = {
               etype: g.in_degrees(etype=etype).float() ** 0.75
               for etype in g.canonical_etypes}
           self.k = k

       def __call__(self, g, eids_dict):
           result_dict = {}
           for etype, eids in eids_dict.items():
               src, _ = g.find_edges(eids, etype=etype)
               src = src.repeat_interleave(self.k)
               dst = self.weights[etype].multinomial(len(src), replacement=True)
               result_dict[etype] = (src, dst)
           return result_dict

Then you can give the dataloader a dictionary of edge types and edge IDs as well as the negative
sampler.  For instance, the following iterates over all edges of the heterogeneous graph.

.. code:: python

    train_eid_dict = {
        etype: g.edges(etype=etype, form='eid')
        for etype in g.canonical_etypes}
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, negative_sampler=NegativeSampler(g, 5))
    dataloader = dgl.dataloading.DataLoader(
        g, train_eid_dict, sampler,
        batch_size=1024,
        shuffle=True,
        drop_last=False,
        num_workers=4)

The training loop is again almost the same as that on homogeneous graph,
except for the implementation of ``compute_loss`` that will take in two
dictionaries of node types and predictions here.

.. code:: python

    model = Model(in_features, hidden_features, out_features, num_classes, etypes)
    model = model.cuda()
    opt = torch.optim.Adam(model.parameters())
    
    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
        blocks = [b.to(torch.device('cuda')) for b in blocks]
        positive_graph = positive_graph.to(torch.device('cuda'))
        negative_graph = negative_graph.to(torch.device('cuda'))
        input_features = blocks[0].srcdata['features']
        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
        loss = compute_loss(pos_score, neg_score)
        opt.zero_grad()
        loss.backward()
        opt.step()