dgl.edge_label_informativeness

dgl.edge_label_informativeness(graph, y, eps=1e-08)[source]

Label informativeness ( $LI$ ) is a characteristic of labeled graphs proposed in the Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Label informativeness shows how much information about a node’s label we get from knowing its neighbor’s label. Formally, assume that we sample an edge $(ξ, η) \in E$ . The class labels of nodes $ξ$ and $η$ are then random variables $y_{ξ}$ and $y_{η}$ . We want to measure the amount of knowledge the label $y_{η}$ gives for predicting $y_{ξ}$ . The entropy $H (y_{ξ})$ measures the hardness’ of predicting the label of :math:xi` without knowing $y_{η}$ . Given $y_{η}$ , this value is reduced to the conditional entropy $H (y_{ξ} | y_{η})$ . In other words, $y_{η}$ reveals $I (y_{ξ}, y_{η}) = H (y_{ξ}) - H (y_{ξ} | y_{η})$ information about the label. To make the obtained quantity comparable across different datasets, label informativeness is defined as the normalized mutual information of $y_{ξ}$ and $y_{η}$ :

LI = \frac{I (y_{ξ}, y_{η})}{H (y_{ξ})}

Depending on the distribution used for sampling an edge $(ξ, η)$ , several variants of label informativeness can be obtained. Two of them are particularly intuitive: in edge label informativeness ( ${LI}_{e d g e}$ ), edges are sampled uniformly at random, and in node label informativeness ( ${LI}_{n o d e}$ ), first a node is sampled uniformly at random and then an edge incident to it is sampled uniformly at random. These two versions of label informativeness differ in how they weight high/low-degree nodes. In edge label informativeness, averaging is over the edges, thus high-degree nodes are given more weight. In node label informativeness, averaging is over the nodes, so all nodes are weighted equally.

This function computes edge label informativeness.

Parameters:

graph (DGLGraph) – The graph.
y (torch.Tensor) – The node labels, which is a tensor of shape (|V|).
eps (float, optional) – A small constant for numerical stability. (default: 1e-8)

Returns:

The edge label informativeness value.

Return type:

float

Examples

>>> import dgl
>>> import torch

>>> graph = dgl.graph(([0, 1, 2, 2, 3, 4], [1, 2, 0, 3, 4, 5]))
>>> y = torch.tensor([0, 0, 0, 0, 1, 1])
>>> dgl.edge_label_informativeness(graph, y)
0.25177597999572754