GraphormerLayer๏ƒ

class dgl.nn.pytorch.gt.GraphormerLayer(feat_size, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())[source]๏ƒ

Bases: Module

Graphormer Layer with Dense Multi-Head Attention, as introduced in Do Transformers Really Perform Bad for Graph Representation?

Parameters:
  • feat_size (int) โ€“ Feature size.

  • hidden_size (int) โ€“ Hidden size of feedforward layers.

  • num_heads (int) โ€“ Number of attention heads, by which feat_size is divisible.

  • attn_bias_type (str, optional) โ€“

    The type of attention bias used for modifying attention. Selected from โ€˜addโ€™ or โ€˜mulโ€™. Default: โ€˜addโ€™.

    • โ€™addโ€™ is for additive attention bias.

    • โ€™mulโ€™ is for multiplicative attention bias.

  • norm_first (bool, optional) โ€“ If True, it performs layer normalization before attention and feedforward operations. Otherwise, it applies layer normalization afterwards. Default: False.

  • dropout (float, optional) โ€“ Dropout probability. Default: 0.1.

  • attn_dropout (float, optional) โ€“ Attention dropout probability. Default: 0.1.

  • activation (callable activation layer, optional) โ€“ Activation function. Default: nn.ReLU().

Examples

>>> import torch as th
>>> from dgl.nn import GraphormerLayer
>>> batch_size = 16
>>> num_nodes = 100
>>> feat_size = 512
>>> num_heads = 8
>>> nfeat = th.rand(batch_size, num_nodes, feat_size)
>>> bias = th.rand(batch_size, num_nodes, num_nodes, num_heads)
>>> net = GraphormerLayer(
        feat_size=feat_size,
        hidden_size=2048,
        num_heads=num_heads
    )
>>> out = net(nfeat, bias)
forward(nfeat, attn_bias=None, attn_mask=None)[source]๏ƒ

Forward computation.

Parameters:
  • nfeat (torch.Tensor) โ€“ A 3D input tensor. Shape: (batch_size, N, feat_size), where N is the maximum number of nodes.

  • attn_bias (torch.Tensor, optional) โ€“ The attention bias used for attention modification. Shape: (batch_size, N, N, num_heads).

  • attn_mask (torch.Tensor, optional) โ€“ The attention mask used for avoiding computation on invalid positions, where invalid positions are indicated by True values. Shape: (batch_size, N, N). Note: For rows corresponding to unexisting nodes, make sure at least one entry is set to False to prevent obtaining NaNs with softmax.

Returns:

y โ€“ The output tensor. Shape: (batch_size, N, feat_size)

Return type:

torch.Tensor