GraphormerLayer

class dgl.nn.pytorch.gt.GraphormerLayer(feat_size, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())[source]

Bases: Module

Graphormer Layer with Dense Multi-Head Attention, as introduced in Do Transformers Really Perform Bad for Graph Representation?

Parameters:

feat_size (int) – Feature size.
hidden_size (int) – Hidden size of feedforward layers.
num_heads (int) – Number of attention heads, by which feat_size is divisible.
attn_bias_type (str, optional) –
The type of attention bias used for modifying attention. Selected from ‘add’ or ‘mul’. Default: ‘add’.
- ’add’ is for additive attention bias.
- ’mul’ is for multiplicative attention bias.
norm_first (bool, optional) – If True, it performs layer normalization before attention and feedforward operations. Otherwise, it applies layer normalization afterwards. Default: False.
dropout (float, optional) – Dropout probability. Default: 0.1.
attn_dropout (float, optional) – Attention dropout probability. Default: 0.1.
activation (callable activation layer, optional) – Activation function. Default: nn.ReLU().

Examples

>>> import torch as th
>>> from dgl.nn import GraphormerLayer

>>> batch_size = 16
>>> num_nodes = 100
>>> feat_size = 512
>>> num_heads = 8
>>> nfeat = th.rand(batch_size, num_nodes, feat_size)
>>> bias = th.rand(batch_size, num_nodes, num_nodes, num_heads)
>>> net = GraphormerLayer(
        feat_size=feat_size,
        hidden_size=2048,
        num_heads=num_heads
    )
>>> out = net(nfeat, bias)

forward(nfeat, attn_bias=None, attn_mask=None)[source]

Forward computation.

Parameters:

nfeat (torch.Tensor) – A 3D input tensor. Shape: (batch_size, N, feat_size), where N is the maximum number of nodes.
attn_bias (torch.Tensor, optional) – The attention bias used for attention modification. Shape: (batch_size, N, N, num_heads).
attn_mask (torch.Tensor, optional) – The attention mask used for avoiding computation on invalid positions, where invalid positions are indicated by True values. Shape: (batch_size, N, N). Note: For rows corresponding to unexisting nodes, make sure at least one entry is set to False to prevent obtaining NaNs with softmax.

Returns:

y – The output tensor. Shape: (batch_size, N, feat_size)

Return type:

torch.Tensor