Anvil Workflow Specifications

Specification models for Anvil workflows.

class openadmet.models.anvil.specification.AnvilSection(*, type: str | None = None, params: dict = {})[source]

Bases: SpecBase

Anvil specification section base class.

Variables:

type (Optional[str]) – The type of the section.
params (dict) – The parameters for the section.
section_name (ClassVar[str]) – The name of the section.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

params: dict

section_name: ClassVar[str] = 'INVALID'

to_class()[source]

Convert the specification to the corresponding class instance.

Returns:: instance – An instance of the class corresponding to the section type.
Return type:: object

type: str | None

class openadmet.models.anvil.specification.AnvilSpecification(*, metadata: Metadata, data: DataSpec, procedure: ProcedureSpec, report: ReportSpec)[source]

Bases: BaseModel

Full specification for Anvil workflow.

data: DataSpec

metadata: Metadata

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

procedure: ProcedureSpec

report: ReportSpec

run(output_dir: PathLike = 'anvil_training', debug: bool = False, tag: str = None)[source]: Run the Anvil workflow from this specification.

to_multi_yaml(metadata_yaml='metadata.yaml', procedure_yaml='procedure.yaml', data_yaml='data.yaml', report_yaml='eval.yaml', **storage_options)[source]

Write specification to multiple YAML files.

Parameters:

metadata_yaml (str or PathLike, optional) – The file path for the metadata YAML file. Default is ‘metadata.yaml’.
procedure_yaml (str or PathLike, optional) – The file path for the procedure YAML file. Default is ‘procedure.yaml’.
data_yaml (str or PathLike, optional) – The file path for the data YAML file. Default is ‘data.yaml’.
report_yaml (str or PathLike, optional) – The file path for the report YAML file. Default is ‘eval.yaml’.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS)

to_recipe(path, **storage_options)[source]: Write specification to YAML recipe file.

to_workflow()[source]: Convert the specification to a workflow object.

Bases: BaseModel

Data specification for the workflow.

Variables:

type (str) – The type of data source (e.g., ‘csv’, ‘yaml’).
resource (str) – The path or URL to the data resource.
cat_entry (Optional[str]) – The catalog entry name if the resource is a YAML catalog.
target_cols (Union[str, list[str]]) – The target column(s) in the dataset.
input_col (str) – The input column in the dataset.
anvil_dir (Optional[str]) – The base directory for relative paths.
dropna (Optional[bool]) – Whether to drop rows with NaN values.
train_resource (Optional[str]) – The path or URL to the training data resource (if using separate train/test).
test_resource (Optional[str]) – The path or URL to the testing data resource (if using separate train/test).
val_resource (Optional[str]) – The path or URL to the validation data resource (if using separate train/test).
_catalog (Optional[intake.catalog.Catalog]) – The intake catalog object if the resource is a YAML file.
_using_train_test (bool) – Whether using separate train and test resources.

anvil_dir: str | None

cat_entry: str | None

property catalog: Get the intake catalog if the resource is a YAML file.

check_resource_test_train()[source]: Ensure that either resource or train/test/val resources are provided, not both.

dropna: bool | None

input_col: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

read() → tuple[pandas.Series, pandas.Series][source]

Read the data from the resource.

Returns:

input (pd.Series) – The input data (e.g., SMILES strings)
targets (pd.Series) – The target data (e.g., properties to predict)

resource: str | None

target_cols: str | list[str]

template_anvil_dir(anvil_dir: Path)[source]: Template all resources with ANVIL_DIR if present.

template_resource()[source]

Template the resource with ANVIL_DIR if present.

Returns:: self – The DataSpec instance with the templated resource.
Return type:: DataSpec

test_resource: str | None

to_yaml(path, **storage_options)[source]

Write specification to YAML file.

Parameters:

path (str or PathLike) – The file path to write the YAML content to.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).

train_resource: str | None

type: str

property using_train_test: Whether using separate train and test resources.

val_resource: str | None

class openadmet.models.anvil.specification.EnsembleSpec(*, type: str | None = None, params: dict = {}, n_models: int, calibration_method: str | None = None, use_bagging: bool = False, param_paths: list[str] | None = None, serial_paths: list[str] | None = None)[source]

Bases: AnvilSection

Ensemble specification.

Variables:

section_name (ClassVar[str]) – The name of the section.
n_models (int) – The number of models in the ensemble.
calibration_method (str) – The calibration method to use.
param_paths (Optional[list[str]]) – The list of parameter file paths for the ensemble models.
serial_paths (Optional[list[str]]) – The list of serialization file paths for the ensemble models.

calibration_method: str | None

check_paths()[source]: Ensure both param_paths and serial_paths are provided together.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_models: int

param_paths: list[str] | None

section_name: ClassVar[str] = 'ensemble'

serial_paths: list[str] | None

template_anvil_dir(anvil_dir: Path)[source]: Template param_paths and serial_paths with ANVIL_DIR.

use_bagging: bool

class openadmet.models.anvil.specification.EvalSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Evaluation specification.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'eval'

class openadmet.models.anvil.specification.FeatureSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Featurization specification.

Variables:

section_name (ClassVar[str]) – The name of the section.
type (Optional[str]) – The type of featurizer to use.
params (dict) – The parameters for the featurizer.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'feat'

class openadmet.models.anvil.specification.Metadata(*, version: Literal['v1'], driver: str = 'sklearn', name: str, build_number: int, description: str, tag: str, authors: str, email: EmailStr, biotargets: list[str], tags: list[str])[source]

Bases: SpecBase

Metadata specification.

Variables:

version (Literal["v1"]) – The version of the metadata schema.
driver (str) – The driver for the workflow.
name (str) – The name of the workflow.
build_number (int) – The build number of the workflow (must be non-negative).
description (str) – Description of the workflow.
tag (str) – Primary tag for the workflow.
authors (str) – Name of the authors.
email (EmailStr) – Email address of the contact person.
biotargets (list[str]) – List of biotargets associated with the workflow.
tags (list[str]) – Additional tags for the workflow.

authors: str

biotargets: list[str]

build_number: int

description: str

driver: str

email: EmailStr

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

tag: str

tags: list[str]

version: Literal['v1']

class openadmet.models.anvil.specification.ModelSpec(*, type: str | None = None, params: dict = {}, param_path: str | None = None, serial_path: str | None = None, freeze_weights: dict | None = None)[source]

Bases: AnvilSection

Model specification.

Variables:

section_name (ClassVar[str]) – The name of the section.
param_path (Optional[str]) – The path to the model parameters file.
serial_path (Optional[str]) – The path to the model serialization file.
freeze_weights (Optional[dict]) – A dictionary specifying which layers to freeze during training.

check_freeze_weights()[source]

Ensure freeze weights is supplied for only applicable model types.

Returns:: self – The validated ModelSpec instance.
Return type:: ModelSpec

check_paths()[source]

Ensure both param_path and serial_path are provided together.

Returns:: self – The validated ModelSpec instance.
Return type:: ModelSpec

freeze_weights: dict | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

param_path: str | None

section_name: ClassVar[str] = 'model'

serial_path: str | None

template_anvil_dir(anvil_dir: Path)[source]: Template param_path and serial_path with ANVIL_DIR.

class openadmet.models.anvil.specification.ProcedureSpec(*, split: SplitSpec, feat: FeatureSpec, model: ModelSpec, ensemble: EnsembleSpec | None = None, train: TrainerSpec, transform: TransformSpec | None = None)[source]

Bases: SpecBase

Procedure specification.

ensemble: EnsembleSpec | None

feat: FeatureSpec

model: ModelSpec

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'procedure'

split: SplitSpec

template_anvil_dir(anvil_dir: Path)[source]: Template ANVIL_DIR in model and ensemble path fields.

train: TrainerSpec

transform: TransformSpec | None

class openadmet.models.anvil.specification.ReportSpec(*, eval: list[openadmet.models.anvil.specification.EvalSpec])[source]

Bases: SpecBase

Report specification.

eval: list[openadmet.models.anvil.specification.EvalSpec]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'report'

class openadmet.models.anvil.specification.SpecBase[source]

Bases: BaseModel

Base class for specifications.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_yaml(path, **storage_options)[source]

Write specification to YAML file.

Parameters:

path (str or PathLike) – The file path to write the YAML content to.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).

class openadmet.models.anvil.specification.SplitSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Data split specification.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'split'

class openadmet.models.anvil.specification.TrainerSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Trainer specification.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'train'

class openadmet.models.anvil.specification.TransformSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Transform specification.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'transform'