dgl.graphbolt.unique_and_compact

dgl.graphbolt.unique_and_compact(nodes: List[Tensor] | Dict[str, List[Tensor]], rank: int = 0, world_size: int = 1, async_op: bool = False)[source]

Compact a list of nodes tensor. The rank and world_size parameters are relevant when using Cooperative Minibatching, which was initially proposed in `Deep Graph Library PR#4337<https://github.com/dmlc/dgl/pull/4337>`__ and was later first fully described in Cooperative Minibatching in Graph Neural Networks. Cooperation between the GPUs eliminates duplicate work performed across the GPUs due to the overlapping sampled k-hop neighborhoods of seed nodes when performing GNN minibatching.

When world_size is greater than 1, then the given ids are partitioned between the available ranks. The ids corresponding to the given rank are guaranteed to come before the ids of other ranks. To do this, the partitioned ids are rotated backwards by the given rank so that the ids are ordered as: [rank, rank + 1, world_size, 0, …, rank - 1]. This is supported only for Volta and later generation NVIDIA GPUs.

Parameters:

nodes (List[torch.Tensor] or Dict[str, List[torch.Tensor]]) – List of nodes for compacting. the unique_and_compact will be done per type - If nodes is a list of tensor: All the tensors will do unique and compact together, usually it is used for homogeneous graph. - If nodes is a list of dictionary: The keys should be node type and the values should be corresponding nodes, the unique and compact will be done per type, usually it is used for heterogeneous graph.
rank (int) – The rank of the current process.
world_size (int) – The number of processes.
async_op (bool) – Boolean indicating whether the call is asynchronous. If so, the result can be obtained by calling wait on the returned future.

Returns:

The Unique nodes (per type) of all nodes in the input. And the compacted nodes list, where IDs inside are replaced with compacted node IDs. “Compacted node list” indicates that the node IDs in the input node list are replaced with mapped node IDs, where each type of node is mapped to a contiguous space of IDs ranging from 0 to N. The unique nodes offsets tensor partitions the unique_nodes tensor. Has size world_size + 1 and unique_nodes[offsets[i]: offsets[i + 1]] belongs to the rank (rank + i) % world_size.

Return type:

Tuple[unique_nodes, compacted_node_list, unique_nodes_offsets]