ChemProp

ChemProp featurizer implementation.

class openadmet.models.features.chemprop.ChemPropFeaturizer(*, normalize_targets: bool = True, n_jobs: int = 4, batch_size: int = 128, shuffle: bool = False)[source]

Bases: DeepLearningFeaturizer

ChemPropFeaturizer featurizer for molecules, relies on chemprop.

Parameters:

normalize_targets (bool, optional) – Whether to normalize the targets using StandardScaler, by default True
n_jobs (int, optional) – Number of parallel workers to use, by default 4
batch_size (int, optional) – Batch size for the DataLoader, by default 128
shuffle (bool, optional) – Whether to shuffle the data in the DataLoader, by default False

batch_size: int

static dataset_to_dataloader(dataset: MoleculeDataset, batch_size: int = 128, shuffle: bool = False, sampler=None, **kwargs) → DataLoader[source]

Convert a MoleculeDataset to a PyTorch DataLoader.

Parameters:

dataset (MoleculeDataset) – The dataset containing the molecules to load.
batch_size (int, optional) – Number of samples per batch to load (default is 128).
shuffle (bool, optional) – Whether to shuffle the data at every epoch (default is False).
sampler (torch.utils.data.Sampler, optional) – Custom sampler to use for loading data (default is None).
**kwargs – Additional keyword arguments passed to the DataLoader.

Returns:

A PyTorch DataLoader for the given MoleculeDataset.

Return type:

DataLoader

featurize(smiles: Iterable[str], y: Iterable[Any] = None) → tuple[DataLoader, np.ndarray, StandardScaler, MoleculeDataset | ReactionDataset | MulticomponentDataset][source]

Featurize a list of SMILES strings.

Parameters:

smiles (Iterable[str]) – List or iterable of SMILES strings to featurize.
y (Iterable[Any], optional) – Target values corresponding to the SMILES strings.

Returns:

Tuple containing: - DataLoader: PyTorch DataLoader for the dataset. - np.ndarray: Array of indices corresponding to the original input. - StandardScaler: Scaler used for any scaling during featurization. - Union[MoleculeDataset, ReactionDataset, MulticomponentDataset]: PyTorch Dataset containing the features and targets.

Return type:

tuple

make_new() → ChemPropFeaturizer[source]: Copy parameters to a new ChemPropFeaturizer instance.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_jobs: int

normalize_targets: bool

shuffle: bool