TorchBasedFeature

class dgl.graphbolt.TorchBasedFeature(torch_feature: Tensor, metadata: Dict | None = None)[source]

Bases: Feature

A wrapper of pytorch based feature.

Initialize a torch based feature store by a torch feature. Note that the feature can be either in memory or on disk.

Parameters:: torch_feature (torch.Tensor) – The torch feature. Note that the dimension of the tensor should be greater than 1.

Examples

>>> import torch
>>> from dgl import graphbolt as gb

The feature is in memory.

>>> torch_feat = torch.arange(10).reshape(2, -1)
>>> feature = gb.TorchBasedFeature(torch_feat)
>>> feature.read()
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
>>> feature.read(torch.tensor([0]))
tensor([[0, 1, 2, 3, 4]])
>>> feature.update(torch.tensor([[1 for _ in range(5)]]),
...                      torch.tensor([1]))
>>> feature.read(torch.tensor([0, 1]))
tensor([[0, 1, 2, 3, 4],
        [1, 1, 1, 1, 1]])
>>> feature.size()
torch.Size([5])

2. The feature is on disk. Note that you can use gb.numpy_save_aligned as a replacement for np.save to potentially get increased performance.

>>> import numpy as np
>>> arr = np.array([[1, 2], [3, 4]])
>>> np.save("/tmp/arr.npy", arr)
>>> torch_feat = torch.from_numpy(np.load("/tmp/arr.npy", mmap_mode="r+"))
>>> feature = gb.TorchBasedFeature(torch_feat)
>>> feature.read()
tensor([[1, 2],
        [3, 4]])
>>> feature.read(torch.tensor([0]))
tensor([[1, 2]])

Pinned CPU feature.

>>> torch_feat = torch.arange(10).reshape(2, -1).pin_memory()
>>> feature = gb.TorchBasedFeature(torch_feat)
>>> feature.read().device
device(type='cuda', index=0)
>>> feature.read(torch.tensor([0]).cuda()).device
device(type='cuda', index=0)

count()[source]

Get the count of the feature.

Returns:: The count of the feature.
Return type:: int

is_pinned()[source]: Returns True if the stored feature is pinned.

metadata()[source]

Get the metadata of the feature.

Returns:: The metadata of the feature.
Return type:: Dict

pin_memory_()[source]: In-place operation to copy the feature to pinned memory. Returns the same object modified in-place.

read(ids: Tensor | None = None)[source]

Read the feature by index.

If the feature is on pinned CPU memory and ids is on GPU or pinned CPU memory, it will be read by GPU and the returned tensor will be on GPU. Otherwise, the returned tensor will be on CPU.

Parameters:: ids (torch.Tensor, optional) – The index of the feature. If specified, only the specified indices of the feature are read. If None, the entire feature is returned.
Returns:: The read feature.
Return type:: torch.Tensor

read_async(ids: Tensor)[source]

Read the feature by index asynchronously.

Parameters:: ids (torch.Tensor) – The index of the feature. Only the specified indices of the feature are read.
Returns:: The returned generator object returns a future on read_async_num_stages(ids.device)th invocation. The return result can be accessed by calling .wait(). on the returned future object. It is undefined behavior to call .wait() more than once.
Return type:: A generator object.

Examples

>>> import dgl.graphbolt as gb
>>> feature = gb.Feature(...)
>>> ids = torch.tensor([0, 2])
>>> for stage, future in enumerate(feature.read_async(ids)):
...     pass
>>> assert stage + 1 == feature.read_async_num_stages(ids.device)
>>> result = future.wait()  # result contains the read values.

read_async_num_stages(ids_device: device)[source]

The number of stages of the read_async operation. See read_async function for directions on its use. This function is required to return the number of yield operations when read_async is used with a tensor residing on ids_device.

Parameters:: ids_device (torch.device) – The device of the ids parameter passed into read_async.
Returns:: The number of stages of the read_async operation.
Return type:: int

size()[source]

Get the size of the feature.

Returns:: The size of the feature.
Return type:: torch.Size

to(device)[source]: Copy TorchBasedFeature to the specified device.

update(value: Tensor, ids: Tensor | None = None)[source]

Update the feature store.

Parameters:

value (torch.Tensor) – The updated value of the feature.
ids (torch.Tensor, optional) – The indices of the feature to update. If specified, only the specified indices of the feature will be updated. For the feature, the ids[i] row is updated to value[i]. So the indices and value must have the same length. If None, the entire feature will be updated.