QM9Dataset
- class dgl.data.QM9Dataset(label_keys, cutoff=5.0, raw_dir=None, force_reload=False, verbose=False, transform=None)[source]
Bases:
DGLDataset
QM9 dataset for graph property prediction (regression)
This dataset consists of 130,831 molecules with 12 regression targets. Nodes correspond to atoms and edges correspond to close atom pairs.
- This dataset differs from
QM9EdgeDataset
in the following aspects: Edges in this dataset are purely distance-based.
It only provides atoms’ coordinates and atomic numbers as node features
It only provides 12 regression targets.
Reference:
Statistics:
Number of graphs: 130,831
Number of regression targets: 12
Keys
Property
Description
Unit
mu
Dipole moment
alpha
Isotropic polarizability
homo
Highest occupied molecular orbital energy
lumo
Lowest unoccupied molecular orbital energy
gap
Gap between
andr2
Electronic spatial extent
zpve
Zero point vibrational energy
U0
Internal energy at 0K
U
Internal energy at 298.15K
H
Enthalpy at 298.15K
G
Free energy at 298.15K
Cv
Heat capavity at 298.15K
- Parameters:
label_keys (list) – Names of the regression property, which should be a subset of the keys in the table above.
cutoff (float) – Cutoff distance for interatomic interactions, i.e. two atoms are connected in the corresponding graph if the distance between them is no larger than this. Default: 5.0 Angstrom
raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- Raises:
UserWarning – If the raw data is changed in the remote server by the author.
Examples
>>> data = QM9Dataset(label_keys=['mu', 'gap'], cutoff=5.0) >>> data.num_tasks 2 >>> >>> # iterate over the dataset >>> for g, label in data: ... R = g.ndata['R'] # get coordinates of each atom ... Z = g.ndata['Z'] # get atomic numbers of each atom ... # your code here... >>>
- This dataset differs from