Feature Base Classes

Base classes and utilities for molecular featurizers.

class openadmet.models.features.feature_base.DeepLearningFeaturizer[source]

Bases: FeaturizerBase

Base class for deep learning featurizers.

This class extends FeaturizerBase and standardizes the output for deep learning workflows. Subclasses should implement the featurize method to return a DataLoader, indices, a StandardScaler, and a PyTorch Dataset.

abstract featurize(smiles: Iterable[str], y: Iterable[float] = None) tuple[torch.utils.data.DataLoader, numpy.ndarray, sklearn.preprocessing._data.StandardScaler, torch.utils.data.Dataset][source]

Featurize a list of SMILES strings for deep learning models.

Parameters:
  • smiles (Iterable[str]) – List or iterable of SMILES strings to featurize.

  • y (Iterable[float], optional) – Target values corresponding to the SMILES strings.

Returns:

Tuple containing: - DataLoader: PyTorch DataLoader for the dataset. - np.ndarray: Array of indices corresponding to the original input. - StandardScaler: Scaler used for any scaling during featurization. - Dataset: PyTorch Dataset containing the features and targets.

Return type:

tuple

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class openadmet.models.features.feature_base.FeaturizerBase[source]

Bases: BaseModel, ABC

Base class for featurizers, allowing for arbitrary featurization of molecules.

This class defines the interface for all featurizers. Subclasses should implement the featurize method to convert a list of SMILES strings into features suitable for machine learning models.

abstract featurize(smiles: Iterable[str], *args, **kwargs)[source]

Featurize a list of SMILES strings.

Parameters:
  • smiles (Iterable[str]) – List or iterable of SMILES strings to featurize.

  • *args – Additional positional arguments.

  • **kwargs – Additional keyword arguments.

Returns:

Features in an appropriate format for the model (e.g., numpy arrays, dataloaders, etc.) and optional processing info.

Return type:

Any

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class openadmet.models.features.feature_base.MolfeatFeaturizer(*args)[source]

Bases: FeaturizerBase

Featurizer using molfeat.

This class provides a base for featurizers that use the molfeat library. It manages a MoleculeTransformer instance for feature extraction.

Variables:

_transformer (MoleculeTransformer) – The underlying molfeat transformer used for featurization.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

property transformer

Return the transformer, for use in SkLearn pipelines etc.

openadmet.models.features.feature_base.get_featurizer_class(feat_type)[source]

Retrieve a featurizer class from the registry by type.