.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/large/L2_large_link_prediction.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_large_L2_large_link_prediction.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_large_L2_large_link_prediction.py:


Stochastic Training of GNN for Link Prediction
==============================================

This tutorial will show how to train a multi-layer GraphSAGE for link
prediction on ``ogbn-arxiv`` provided by `Open Graph Benchmark
(OGB) <https://ogb.stanford.edu/>`__. The dataset
contains around 170 thousand nodes and 1 million edges.

By the end of this tutorial, you will be able to

-  Train a GNN model for link prediction on a single GPU with DGL's
   neighbor sampling components.

This tutorial assumes that you have read the :doc:`Introduction of Neighbor
Sampling for GNN Training <L0_neighbor_sampling_overview>` and :doc:`Neighbor
Sampling for Node Classification <L1_large_node_classification>`.

.. GENERATED FROM PYTHON SOURCE LINES 23-45

Link Prediction Overview
------------------------

Link prediction requires the model to predict the probability of
existence of an edge. This tutorial does so by computing a dot product
between the representations of both incident nodes.

.. math::


   \hat{y}_{u\sim v} = \sigma(h_u^T h_v)

It then minimizes the following binary cross entropy loss.

.. math::


   \mathcal{L} = -\sum_{u\sim v\in \mathcal{D}}\left( y_{u\sim v}\log(\hat{y}_{u\sim v}) + (1-y_{u\sim v})\log(1-\hat{y}_{u\sim v})) \right)

This is identical to the link prediction formulation in :doc:`the previous
tutorial on link prediction <../blitz/4_link_predict>`.


.. GENERATED FROM PYTHON SOURCE LINES 48-54

Loading Dataset
---------------

This tutorial loads the dataset from the ``ogb`` package as in the
:doc:`previous tutorial <L1_large_node_classification>`.


.. GENERATED FROM PYTHON SOURCE LINES 54-84

.. code-block:: Python


    import os

    os.environ["DGLBACKEND"] = "pytorch"
    import dgl
    import numpy as np
    import torch
    from ogb.nodeproppred import DglNodePropPredDataset

    dataset = DglNodePropPredDataset("ogbn-arxiv")
    device = "cpu"  # change to 'cuda' for GPU

    graph, node_labels = dataset[0]
    # Add reverse edges since ogbn-arxiv is unidirectional.
    graph = dgl.add_reverse_edges(graph)
    print(graph)
    print(node_labels)

    node_features = graph.ndata["feat"]
    node_labels = node_labels[:, 0]
    num_features = node_features.shape[1]
    num_classes = (node_labels.max() + 1).item()
    print("Number of classes:", num_classes)

    idx_split = dataset.get_idx_split()
    train_nids = idx_split["train"]
    valid_nids = idx_split["valid"]
    test_nids = idx_split["test"]


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Graph(num_nodes=169343, num_edges=2332486,
          ndata_schemes={'year': Scheme(shape=(1,), dtype=torch.int64), 'feat': Scheme(shape=(128,), dtype=torch.float32)}
          edata_schemes={})
    tensor([[ 4],
            [ 5],
            [28],
            ...,
            [10],
            [ 4],
            [ 1]])
    Number of classes: 40


.. GENERATED FROM PYTHON SOURCE LINES 85-105

Defining Neighbor Sampler and Data Loader in DGL
------------------------------------------------

Different from the :doc:`link prediction tutorial for full
graph <../blitz/4_link_predict>`, a common practice to train GNN on large graphs is
to iterate over the edges
in minibatches, since computing the probability of all edges is usually
impossible. For each minibatch of edges, you compute the output
representation of their incident nodes using neighbor sampling and GNN,
in a similar fashion introduced in the :doc:`large-scale node classification
tutorial <L1_large_node_classification>`.

DGL provides ``dgl.dataloading.as_edge_prediction_sampler`` to
iterate over edges for edge classification or link prediction tasks.

To perform link prediction, you need to specify a negative sampler. DGL
provides builtin negative samplers such as
``dgl.dataloading.negative_sampler.Uniform``.  Here this tutorial uniformly
draws 5 negative examples per positive example.


.. GENERATED FROM PYTHON SOURCE LINES 105-109

.. code-block:: Python


    negative_sampler = dgl.dataloading.negative_sampler.Uniform(5)


.. GENERATED FROM PYTHON SOURCE LINES 110-115

After defining the negative sampler, one can then define the edge data
loader with neighbor sampling.  To create an ``DataLoader`` for
link prediction, provide a neighbor sampler object as well as the negative
sampler object created above.


.. GENERATED FROM PYTHON SOURCE LINES 115-134

.. code-block:: Python


    sampler = dgl.dataloading.NeighborSampler([4, 4])
    sampler = dgl.dataloading.as_edge_prediction_sampler(
        sampler, negative_sampler=negative_sampler
    )
    train_dataloader = dgl.dataloading.DataLoader(
        # The following arguments are specific to DataLoader.
        graph,  # The graph
        torch.arange(graph.num_edges()),  # The edges to iterate over
        sampler,  # The neighbor sampler
        device=device,  # Put the MFGs on CPU or GPU
        # The following arguments are inherited from PyTorch DataLoader.
        batch_size=1024,  # Batch size
        shuffle=True,  # Whether to shuffle the nodes for every epoch
        drop_last=False,  # Whether to drop the last incomplete batch
        num_workers=0,  # Number of sampler processes
    )


.. GENERATED FROM PYTHON SOURCE LINES 135-138

You can peek one minibatch from ``train_dataloader`` and see what it
will give you.


.. GENERATED FROM PYTHON SOURCE LINES 138-156

.. code-block:: Python


    input_nodes, pos_graph, neg_graph, mfgs = next(iter(train_dataloader))
    print("Number of input nodes:", len(input_nodes))
    print(
        "Positive graph # nodes:",
        pos_graph.num_nodes(),
        "# edges:",
        pos_graph.num_edges(),
    )
    print(
        "Negative graph # nodes:",
        neg_graph.num_nodes(),
        "# edges:",
        neg_graph.num_edges(),
    )
    print(mfgs)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/ubuntu/regression_test/dgl/python/dgl/dataloading/dataloader.py:1149: DGLWarning: Dataloader CPU affinity opt is not enabled, consider switching it on (see enable_cpu_affinity() or CPU best practices for DGL [https://docs.dgl.ai/tutorials/cpu/cpu_best_practises.html])
      dgl_warning(
    Number of input nodes: 56813
    Positive graph # nodes: 6903 # edges: 1024
    Negative graph # nodes: 6903 # edges: 5120
    [Block(num_src_nodes=56813, num_dst_nodes=23745, num_edges=88294), Block(num_src_nodes=23745, num_dst_nodes=6903, num_edges=24112)]


.. GENERATED FROM PYTHON SOURCE LINES 157-175

The example minibatch consists of four elements.

The first element is an ID tensor for the input nodes, i.e., nodes
whose input features are needed on the first GNN layer for this minibatch.

The second element and the third element are the positive graph and the
negative graph for this minibatch.
The concept of positive and negative graphs have been introduced in the
:doc:`full-graph link prediction tutorial <../blitz/4_link_predict>`.  In minibatch
training, the positive graph and the negative graph only contain nodes
necessary for computing the pair-wise scores of positive and negative examples
in the current minibatch.

The last element is a list of :doc:`MFGs <L0_neighbor_sampling_overview>`
storing the computation dependencies for each GNN layer.
The MFGs are used to compute the GNN outputs of the nodes
involved in positive/negative graph.


.. GENERATED FROM PYTHON SOURCE LINES 178-186

Defining Model for Node Representation
--------------------------------------

The model is almost identical to the one in the :doc:`node classification
tutorial <L1_large_node_classification>`. The only difference is
that since you are doing link prediction, the output dimension will not
be the number of classes in the dataset.


.. GENERATED FROM PYTHON SOURCE LINES 186-211

.. code-block:: Python


    import torch.nn as nn
    import torch.nn.functional as F
    from dgl.nn import SAGEConv


    class Model(nn.Module):
        def __init__(self, in_feats, h_feats):
            super(Model, self).__init__()
            self.conv1 = SAGEConv(in_feats, h_feats, aggregator_type="mean")
            self.conv2 = SAGEConv(h_feats, h_feats, aggregator_type="mean")
            self.h_feats = h_feats

        def forward(self, mfgs, x):
            h_dst = x[: mfgs[0].num_dst_nodes()]
            h = self.conv1(mfgs[0], (x, h_dst))
            h = F.relu(h)
            h_dst = h[: mfgs[1].num_dst_nodes()]
            h = self.conv2(mfgs[1], (h, h_dst))
            return h


    model = Model(num_features, 128).to(device)


.. GENERATED FROM PYTHON SOURCE LINES 212-223

Defining the Score Predictor for Edges
--------------------------------------

After getting the node representation necessary for the minibatch, the
last thing to do is to predict the score of the edges and non-existent
edges in the sampled minibatch.

The following score predictor, copied from the :doc:`link prediction
tutorial <../blitz/4_link_predict>`, takes a dot product between the
incident nodes’ representations.


.. GENERATED FROM PYTHON SOURCE LINES 223-238

.. code-block:: Python


    import dgl.function as fn


    class DotPredictor(nn.Module):
        def forward(self, g, h):
            with g.local_scope():
                g.ndata["h"] = h
                # Compute a new edge feature named 'score' by a dot-product between the
                # source node feature 'h' and destination node feature 'h'.
                g.apply_edges(fn.u_dot_v("h", "h", "score"))
                # u_dot_v returns a 1-element vector for each edge so you need to squeeze it.
                return g.edata["score"][:, 0]


.. GENERATED FROM PYTHON SOURCE LINES 239-250

Evaluating Performance with Unsupervised Learning (Optional)
------------------------------------------------------------

There are various ways to evaluate the performance of link prediction.
This tutorial follows the practice of `GraphSAGE
paper <https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf>`__.
Basically, it first trains a GNN via link prediction, and get an embedding
for each node.  Then it trains a downstream classifier on top of this
embedding and compute the accuracy as an assessment of the embedding
quality.


.. GENERATED FROM PYTHON SOURCE LINES 253-263

To obtain the representations of all the nodes, this tutorial uses
neighbor sampling as introduced in the :doc:`node classification
tutorial <L1_large_node_classification>`.

.. note::

   If you would like to obtain node representations without
   neighbor sampling during inference, please refer to this :ref:`user
   guide <guide-minibatch-inference>`.


.. GENERATED FROM PYTHON SOURCE LINES 263-331

.. code-block:: Python


    def inference(model, graph, node_features):
        with torch.no_grad():
            nodes = torch.arange(graph.num_nodes())

            sampler = dgl.dataloading.NeighborSampler([4, 4])
            train_dataloader = dgl.dataloading.DataLoader(
                graph,
                torch.arange(graph.num_nodes()),
                sampler,
                batch_size=1024,
                shuffle=False,
                drop_last=False,
                num_workers=4,
                device=device,
            )

            result = []
            for input_nodes, output_nodes, mfgs in train_dataloader:
                # feature copy from CPU to GPU takes place here
                inputs = mfgs[0].srcdata["feat"]
                result.append(model(mfgs, inputs))

            return torch.cat(result)


    import sklearn.metrics


    def evaluate(emb, label, train_nids, valid_nids, test_nids):
        classifier = nn.Linear(emb.shape[1], num_classes).to(device)
        opt = torch.optim.LBFGS(classifier.parameters())

        def compute_loss():
            pred = classifier(emb[train_nids].to(device))
            loss = F.cross_entropy(pred, label[train_nids].to(device))
            return loss

        def closure():
            loss = compute_loss()
            opt.zero_grad()
            loss.backward()
            return loss

        prev_loss = float("inf")
        for i in range(1000):
            opt.step(closure)
            with torch.no_grad():
                loss = compute_loss().item()
                if np.abs(loss - prev_loss) < 1e-4:
                    print("Converges at iteration", i)
                    break
                else:
                    prev_loss = loss

        with torch.no_grad():
            pred = classifier(emb.to(device)).cpu()
            label = label
            valid_acc = sklearn.metrics.accuracy_score(
                label[valid_nids].numpy(), pred[valid_nids].numpy().argmax(1)
            )
            test_acc = sklearn.metrics.accuracy_score(
                label[test_nids].numpy(), pred[test_nids].numpy().argmax(1)
            )
        return valid_acc, test_acc


.. GENERATED FROM PYTHON SOURCE LINES 332-337

Defining Training Loop
----------------------

The following initializes the model and defines the optimizer.


.. GENERATED FROM PYTHON SOURCE LINES 337-345

.. code-block:: Python


    model = Model(node_features.shape[1], 128).to(device)
    predictor = DotPredictor().to(device)
    opt = torch.optim.Adam(list(model.parameters()) + list(predictor.parameters()))


    import sklearn.metrics


.. GENERATED FROM PYTHON SOURCE LINES 346-350

The following is the training loop for link prediction and
evaluation, and also saves the model that performs the best on the
validation set:


.. GENERATED FROM PYTHON SOURCE LINES 350-397

.. code-block:: Python


    import tqdm

    best_accuracy = 0
    best_model_path = "model.pt"
    for epoch in range(1):
        with tqdm.tqdm(train_dataloader) as tq:
            for step, (input_nodes, pos_graph, neg_graph, mfgs) in enumerate(tq):
                # feature copy from CPU to GPU takes place here
                inputs = mfgs[0].srcdata["feat"]

                outputs = model(mfgs, inputs)
                pos_score = predictor(pos_graph, outputs)
                neg_score = predictor(neg_graph, outputs)

                score = torch.cat([pos_score, neg_score])
                label = torch.cat(
                    [torch.ones_like(pos_score), torch.zeros_like(neg_score)]
                )
                loss = F.binary_cross_entropy_with_logits(score, label)

                opt.zero_grad()
                loss.backward()
                opt.step()

                tq.set_postfix({"loss": "%.03f" % loss.item()}, refresh=False)

                if (step + 1) % 500 == 0:
                    model.eval()
                    emb = inference(model, graph, node_features)
                    valid_acc, test_acc = evaluate(
                        emb, node_labels, train_nids, valid_nids, test_nids
                    )
                    print(
                        "Epoch {} Validation Accuracy {} Test Accuracy {}".format(
                            epoch, valid_acc, test_acc
                        )
                    )
                    if best_accuracy < valid_acc:
                        best_accuracy = valid_acc
                        torch.save(model.state_dict(), best_model_path)
                    model.train()

                    # Note that this tutorial do not train the whole model to the end.
                    break


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

      0%|          | 0/2278 [00:00<?, ?it/s]      0%|          | 2/2278 [00:00<02:08, 17.78it/s, loss=23.098]      0%|          | 7/2278 [00:00<01:10, 32.04it/s, loss=3.414]       0%|          | 11/2278 [00:00<01:05, 34.47it/s, loss=1.738]      1%|          | 15/2278 [00:00<01:02, 36.23it/s, loss=1.523]      1%|          | 20/2278 [00:00<01:00, 37.29it/s, loss=1.006]      1%|          | 24/2278 [00:00<01:00, 37.51it/s, loss=0.840]      1%|▏         | 29/2278 [00:00<00:58, 38.74it/s, loss=0.792]      1%|▏         | 34/2278 [00:00<00:56, 39.62it/s, loss=0.751]      2%|▏         | 39/2278 [00:01<00:55, 40.43it/s, loss=0.727]      2%|▏         | 44/2278 [00:01<00:54, 40.87it/s, loss=0.716]      2%|▏         | 49/2278 [00:01<00:55, 40.49it/s, loss=0.691]      2%|▏         | 54/2278 [00:01<00:54, 40.82it/s, loss=0.687]      3%|▎         | 59/2278 [00:01<00:55, 40.15it/s, loss=0.682]      3%|▎         | 64/2278 [00:01<00:57, 38.83it/s, loss=0.676]      3%|▎         | 68/2278 [00:01<00:56, 39.00it/s, loss=0.673]      3%|▎         | 72/2278 [00:01<00:56, 38.95it/s, loss=0.678]      3%|▎         | 76/2278 [00:01<00:56, 38.76it/s, loss=0.675]      4%|▎         | 81/2278 [00:02<00:55, 39.84it/s, loss=0.672]      4%|▍         | 86/2278 [00:02<00:55, 39.42it/s, loss=0.672]      4%|▍         | 90/2278 [00:02<00:56, 38.85it/s, loss=0.671]      4%|▍         | 95/2278 [00:02<00:55, 39.49it/s, loss=0.670]      4%|▍         | 100/2278 [00:02<00:54, 40.20it/s, loss=0.672]      5%|▍         | 105/2278 [00:02<00:53, 40.67it/s, loss=0.673]      5%|▍         | 110/2278 [00:02<00:53, 40.84it/s, loss=0.673]      5%|▌         | 115/2278 [00:02<00:54, 39.57it/s, loss=0.672]      5%|▌         | 119/2278 [00:03<00:54, 39.54it/s, loss=0.669]      5%|▌         | 123/2278 [00:03<00:55, 39.04it/s, loss=0.672]      6%|▌         | 128/2278 [00:03<00:54, 39.79it/s, loss=0.669]      6%|▌         | 132/2278 [00:03<00:54, 39.41it/s, loss=0.671]      6%|▌         | 136/2278 [00:03<00:54, 39.02it/s, loss=0.675]      6%|▌         | 141/2278 [00:03<00:53, 39.89it/s, loss=0.667]      6%|▋         | 145/2278 [00:03<00:53, 39.51it/s, loss=0.666]      7%|▋         | 149/2278 [00:03<00:54, 38.81it/s, loss=0.671]      7%|▋         | 153/2278 [00:03<00:55, 38.54it/s, loss=0.672]      7%|▋         | 158/2278 [00:04<00:53, 39.49it/s, loss=0.672]      7%|▋         | 162/2278 [00:04<00:53, 39.28it/s, loss=0.666]      7%|▋         | 166/2278 [00:04<00:53, 39.20it/s, loss=0.666]      7%|▋         | 170/2278 [00:04<00:54, 38.87it/s, loss=0.664]      8%|▊         | 174/2278 [00:04<00:54, 38.75it/s, loss=0.666]      8%|▊         | 178/2278 [00:04<00:54, 38.40it/s, loss=0.661]      8%|▊         | 182/2278 [00:04<00:55, 37.84it/s, loss=0.665]      8%|▊         | 186/2278 [00:04<00:55, 37.49it/s, loss=0.662]      8%|▊         | 190/2278 [00:04<00:55, 37.94it/s, loss=0.661]      9%|▊         | 194/2278 [00:04<00:54, 38.19it/s, loss=0.665]      9%|▊         | 199/2278 [00:05<00:52, 39.25it/s, loss=0.660]      9%|▉         | 203/2278 [00:05<00:52, 39.17it/s, loss=0.660]      9%|▉         | 207/2278 [00:05<00:53, 38.87it/s, loss=0.659]      9%|▉         | 211/2278 [00:05<00:53, 38.89it/s, loss=0.659]      9%|▉         | 216/2278 [00:05<00:51, 39.83it/s, loss=0.665]     10%|▉         | 220/2278 [00:05<00:52, 39.40it/s, loss=0.665]     10%|▉         | 224/2278 [00:05<00:52, 39.24it/s, loss=0.664]     10%|█         | 228/2278 [00:05<00:52, 39.40it/s, loss=0.667]     10%|█         | 232/2278 [00:05<00:55, 36.82it/s, loss=0.666]     10%|█         | 236/2278 [00:06<00:58, 34.96it/s, loss=0.661]     11%|█         | 240/2278 [00:06<00:56, 36.02it/s, loss=0.669]     11%|█         | 244/2278 [00:06<00:55, 36.87it/s, loss=0.660]     11%|█         | 248/2278 [00:06<00:54, 37.15it/s, loss=0.665]     11%|█         | 252/2278 [00:06<00:53, 37.75it/s, loss=0.663]     11%|█         | 256/2278 [00:06<00:52, 38.24it/s, loss=0.667]     11%|█▏        | 261/2278 [00:06<00:51, 39.04it/s, loss=0.664]     12%|█▏        | 265/2278 [00:06<00:51, 38.97it/s, loss=0.666]     12%|█▏        | 270/2278 [00:06<00:51, 39.04it/s, loss=0.667]     12%|█▏        | 275/2278 [00:07<00:50, 39.65it/s, loss=0.664]     12%|█▏        | 279/2278 [00:07<00:50, 39.48it/s, loss=0.661]     12%|█▏        | 284/2278 [00:07<00:49, 40.09it/s, loss=0.661]     13%|█▎        | 289/2278 [00:07<00:49, 39.90it/s, loss=0.662]     13%|█▎        | 293/2278 [00:07<00:50, 39.42it/s, loss=0.667]     13%|█▎        | 297/2278 [00:07<00:51, 38.46it/s, loss=0.662]     13%|█▎        | 301/2278 [00:07<00:51, 38.71it/s, loss=0.667]     13%|█▎        | 305/2278 [00:07<00:50, 38.92it/s, loss=0.661]     14%|█▎        | 309/2278 [00:07<00:50, 38.72it/s, loss=0.664]     14%|█▍        | 314/2278 [00:08<00:49, 39.59it/s, loss=0.663]     14%|█▍        | 319/2278 [00:08<00:48, 40.25it/s, loss=0.663]     14%|█▍        | 324/2278 [00:08<00:49, 39.27it/s, loss=0.660]     14%|█▍        | 328/2278 [00:08<00:50, 38.80it/s, loss=0.664]     15%|█▍        | 333/2278 [00:08<00:50, 38.89it/s, loss=0.662]     15%|█▍        | 338/2278 [00:08<00:49, 39.40it/s, loss=0.664]     15%|█▌        | 342/2278 [00:08<00:49, 39.27it/s, loss=0.659]     15%|█▌        | 346/2278 [00:08<00:49, 39.17it/s, loss=0.655]     15%|█▌        | 350/2278 [00:09<00:50, 37.89it/s, loss=0.657]     16%|█▌        | 354/2278 [00:09<00:52, 36.72it/s, loss=0.659]     16%|█▌        | 358/2278 [00:09<00:52, 36.58it/s, loss=0.665]     16%|█▌        | 363/2278 [00:09<00:50, 37.99it/s, loss=0.657]     16%|█▌        | 367/2278 [00:09<00:50, 38.10it/s, loss=0.666]     16%|█▋        | 371/2278 [00:09<00:49, 38.25it/s, loss=0.660]     16%|█▋        | 375/2278 [00:09<00:49, 38.38it/s, loss=0.664]     17%|█▋        | 379/2278 [00:09<00:49, 38.56it/s, loss=0.655]     17%|█▋        | 383/2278 [00:09<00:49, 38.25it/s, loss=0.658]     17%|█▋        | 388/2278 [00:10<00:49, 38.37it/s, loss=0.662]     17%|█▋        | 393/2278 [00:10<00:48, 39.10it/s, loss=0.658]     17%|█▋        | 397/2278 [00:10<00:48, 39.03it/s, loss=0.660]     18%|█▊        | 401/2278 [00:10<00:48, 38.94it/s, loss=0.660]     18%|█▊        | 405/2278 [00:10<00:48, 38.25it/s, loss=0.660]     18%|█▊        | 410/2278 [00:10<00:47, 38.95it/s, loss=0.654]     18%|█▊        | 414/2278 [00:10<00:49, 38.03it/s, loss=0.655]     18%|█▊        | 418/2278 [00:10<00:49, 37.63it/s, loss=0.659]     19%|█▊        | 422/2278 [00:10<00:48, 38.13it/s, loss=0.656]     19%|█▊        | 426/2278 [00:11<00:48, 38.39it/s, loss=0.658]     19%|█▉        | 431/2278 [00:11<00:47, 38.64it/s, loss=0.658]     19%|█▉        | 435/2278 [00:11<00:48, 38.34it/s, loss=0.666]     19%|█▉        | 439/2278 [00:11<00:48, 38.15it/s, loss=0.660]     19%|█▉        | 444/2278 [00:11<00:47, 38.43it/s, loss=0.660]     20%|█▉        | 449/2278 [00:11<00:46, 39.22it/s, loss=0.658]     20%|█▉        | 453/2278 [00:11<00:46, 39.03it/s, loss=0.657]     20%|██        | 457/2278 [00:11<00:46, 38.83it/s, loss=0.654]     20%|██        | 461/2278 [00:11<00:46, 38.93it/s, loss=0.653]     20%|██        | 465/2278 [00:12<00:47, 38.17it/s, loss=0.657]     21%|██        | 470/2278 [00:12<00:46, 39.12it/s, loss=0.659]     21%|██        | 475/2278 [00:12<00:45, 39.77it/s, loss=0.654]     21%|██        | 479/2278 [00:12<00:45, 39.45it/s, loss=0.660]     21%|██        | 483/2278 [00:12<00:45, 39.14it/s, loss=0.660]     21%|██▏       | 488/2278 [00:12<00:45, 39.78it/s, loss=0.657]     22%|██▏       | 492/2278 [00:12<00:45, 39.68it/s, loss=0.660]     22%|██▏       | 497/2278 [00:12<00:45, 39.57it/s, loss=0.662]Converges at iteration 10
    Epoch 0 Validation Accuracy 0.07688177455619316 Test Accuracy 0.059379050675884205
     22%|██▏       | 499/2278 [00:18<01:07, 26.32it/s, loss=0.655]


.. GENERATED FROM PYTHON SOURCE LINES 398-415

Evaluating Performance with Link Prediction (Optional)
------------------------------------------------------

In practice, it is more common to evaluate the link prediction
model to see whether it can predict new edges. There are different
evaluation metrics such as
`AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__
or `various metrics from information retrieval <https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)>`__.
Ultimately, they require the model to predict one scalar score given
a node pair among a set of node pairs.

Assuming that you have the following test set with labels, where
``test_pos_src`` and ``test_pos_dst`` are ground truth node pairs
with edges in between (or *positive* pairs), and ``test_neg_src``
and ``test_neg_dst`` are ground truth node pairs without edges
in between (or *negative* pairs).


.. GENERATED FROM PYTHON SOURCE LINES 415-430

.. code-block:: Python


    # Positive pairs
    # These are randomly generated as an example.  You will need to
    # replace them with your own ground truth.
    n_test_pos = 1000
    test_pos_src, test_pos_dst = (
        torch.randint(0, graph.num_nodes(), (n_test_pos,)),
        torch.randint(0, graph.num_nodes(), (n_test_pos,)),
    )
    # Negative pairs.  Likewise, you will need to replace them with your
    # own ground truth.
    test_neg_src = test_pos_src
    test_neg_dst = torch.randint(0, graph.num_nodes(), (n_test_pos,))


.. GENERATED FROM PYTHON SOURCE LINES 431-434

First you need to compute the node representations for all the nodes
with the ``inference`` method above:


.. GENERATED FROM PYTHON SOURCE LINES 434-437

.. code-block:: Python


    node_reprs = inference(model, graph, node_features)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/ubuntu/regression_test/dgl/python/dgl/dataloading/dataloader.py:1149: DGLWarning: Dataloader CPU affinity opt is not enabled, consider switching it on (see enable_cpu_affinity() or CPU best practices for DGL [https://docs.dgl.ai/tutorials/cpu/cpu_best_practises.html])
      dgl_warning(


.. GENERATED FROM PYTHON SOURCE LINES 438-442

Since the predictor is a dot product, you can now easily compute the
score of positive and negative test pairs to compute metrics such
as AUC:


.. GENERATED FROM PYTHON SOURCE LINES 442-460

.. code-block:: Python


    h_pos_src = node_reprs[test_pos_src]
    h_pos_dst = node_reprs[test_pos_dst]
    h_neg_src = node_reprs[test_neg_src]
    h_neg_dst = node_reprs[test_neg_dst]
    score_pos = (h_pos_src * h_pos_dst).sum(1)
    score_neg = (h_neg_src * h_neg_dst).sum(1)
    test_preds = torch.cat([score_pos, score_neg]).cpu().numpy()
    test_labels = (
        torch.cat([torch.ones_like(score_pos), torch.zeros_like(score_neg)])
        .cpu()
        .numpy()
    )

    auc = sklearn.metrics.roc_auc_score(test_labels, test_preds)
    print("Link Prediction AUC:", auc)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Link Prediction AUC: 0.49766


.. GENERATED FROM PYTHON SOURCE LINES 461-467

Conclusion
----------

In this tutorial, you have learned how to train a multi-layer GraphSAGE
for link prediction with neighbor sampling.


.. GENERATED FROM PYTHON SOURCE LINES 467-471

.. code-block:: Python


    # Thumbnail credits: Link Prediction with Neo4j, Mark Needham
    # sphinx_gallery_thumbnail_path = '_static/blitz_4_link_predict.png'


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 21.700 seconds)


.. _sphx_glr_download_tutorials_large_L2_large_link_prediction.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: L2_large_link_prediction.ipynb <L2_large_link_prediction.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: L2_large_link_prediction.py <L2_large_link_prediction.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: L2_large_link_prediction.zip <L2_large_link_prediction.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_