ChemProp

ChemProp featurizer implementation.

class openadmet.models.features.chemprop.ChemPropFeaturizer(*, normalize_targets: bool = True, n_jobs: int = 4, batch_size: int = 128, shuffle: bool = False)[source]

Bases: DeepLearningFeaturizer

ChemPropFeaturizer featurizer for molecules, relies on chemprop.

Parameters:
  • normalize_targets (bool, optional) – Whether to normalize the targets using StandardScaler, by default True

  • n_jobs (int, optional) – Number of parallel workers to use, by default 4

  • batch_size (int, optional) – Batch size for the DataLoader, by default 128

  • shuffle (bool, optional) – Whether to shuffle the data in the DataLoader, by default False

batch_size: int
static dataset_to_dataloader(dataset: MoleculeDataset, batch_size: int = 128, shuffle: bool = False, sampler=None, **kwargs) DataLoader[source]

Convert a MoleculeDataset to a PyTorch DataLoader.

Parameters:
  • dataset (MoleculeDataset) – The dataset containing the molecules to load.

  • batch_size (int, optional) – Number of samples per batch to load (default is 128).

  • shuffle (bool, optional) – Whether to shuffle the data at every epoch (default is False).

  • sampler (torch.utils.data.Sampler, optional) – Custom sampler to use for loading data (default is None).

  • **kwargs – Additional keyword arguments passed to the DataLoader.

Returns:

A PyTorch DataLoader for the given MoleculeDataset.

Return type:

DataLoader

featurize(smiles: Iterable[str], y: Iterable[Any] = None) tuple[DataLoader, np.ndarray, StandardScaler, MoleculeDataset | ReactionDataset | MulticomponentDataset][source]

Featurize a list of SMILES strings.

Parameters:
  • smiles (Iterable[str]) – List or iterable of SMILES strings to featurize.

  • y (Iterable[Any], optional) – Target values corresponding to the SMILES strings.

Returns:

Tuple containing: - DataLoader: PyTorch DataLoader for the dataset. - np.ndarray: Array of indices corresponding to the original input. - StandardScaler: Scaler used for any scaling during featurization. - Union[MoleculeDataset, ReactionDataset, MulticomponentDataset]: PyTorch Dataset containing the features and targets.

Return type:

tuple

make_new() ChemPropFeaturizer[source]

Copy parameters to a new ChemPropFeaturizer instance.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_jobs: int
normalize_targets: bool
shuffle: bool