Dataset Information¶
When working with metatrain, you will most likely need to interact with some core
classes which are responsible for storing some information about datasets. All these
classes belong to the metatrain.utils.data module which can be found in the
Data section of the developer documentation.
These classes are:
metatrain.utils.data.DatasetInfo: This class is responsible for storing information about a dataset. It contains the length unit used in the dataset, the atomic types present, as well as information about the dataset’s targets as aDict[str, TargetInfo]object. The keys of this dictionary are the names of the targets in the datasets (e.g.,energy,mtt::dipole, etc.).metatrain.utils.data.TargetInfo: This class is responsible for storinginformation about a target in a dataset. It contains the target’s physical quantity, the unit in which the target is expressed, and the
layoutof the target. ThelayoutisTensorMapobject with zero samples which is used to exemplify the metadata of each target.
At the moment, only three types of layouts are supported:
- scalar: This type of layout is used when the target is a scalar quantity. The
layoutTensorMapobject corresponding to a scalar must have oneTensorBlockand nocomponents.
- Cartesian tensor: This type of layout is used when the target is a Cartesian tensor.
The
layoutTensorMapobject corresponding to a Cartesian tensor must have oneTensorBlockand as manycomponentsas the tensor’s rank. These components are namedxyzfor a tensor of rank 1 andxyz_1,xyz_2, and so on for higher ranks.
- Spherical tensor: This type of layout is used when the target is a spherical tensor.
The
layoutTensorMapobject corresponding to a spherical tensor can have multiple blocks corresponding to different irreps (irreducible representations) of the target. Thekeysof theTensorMapobject must have theo3_lambdaando3_sigmanames, and eachTensorBlockmust have a single component namedo3_mu.