Anvil Workflow Specifications

Specification models for Anvil workflows.

class openadmet.models.anvil.specification.AnvilSection(*, type: str | None = None, params: dict = {})[source]

Bases: SpecBase

Anvil specification section base class.

Variables:
  • type (Optional[str]) – The type of the section.

  • params (dict) – The parameters for the section.

  • section_name (ClassVar[str]) – The name of the section.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

params: dict
section_name: ClassVar[str] = 'INVALID'
to_class()[source]

Convert the specification to the corresponding class instance.

Returns:

instance – An instance of the class corresponding to the section type.

Return type:

object

type: str | None
class openadmet.models.anvil.specification.AnvilSpecification(*, metadata: Metadata, data: DataSpec, procedure: ProcedureSpec, report: ReportSpec)[source]

Bases: BaseModel

Full specification for Anvil workflow.

data: DataSpec
metadata: Metadata
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

procedure: ProcedureSpec
report: ReportSpec
run(output_dir: PathLike = 'anvil_training', debug: bool = False, tag: str = None)[source]

Run the Anvil workflow from this specification.

to_multi_yaml(metadata_yaml='metadata.yaml', procedure_yaml='procedure.yaml', data_yaml='data.yaml', report_yaml='eval.yaml', **storage_options)[source]

Write specification to multiple YAML files.

Parameters:
  • metadata_yaml (str or PathLike, optional) – The file path for the metadata YAML file. Default is ‘metadata.yaml’.

  • procedure_yaml (str or PathLike, optional) – The file path for the procedure YAML file. Default is ‘procedure.yaml’.

  • data_yaml (str or PathLike, optional) – The file path for the data YAML file. Default is ‘data.yaml’.

  • report_yaml (str or PathLike, optional) – The file path for the report YAML file. Default is ‘eval.yaml’.

  • storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS)

to_recipe(path, **storage_options)[source]

Write specification to YAML recipe file.

to_workflow()[source]

Convert the specification to a workflow object.

class openadmet.models.anvil.specification.DataSpec(*, type: str, resource: str | None = None, cat_entry: str | None = None, target_cols: str | list[str], input_col: str, anvil_dir: str | None = None, dropna: bool | None = False, train_resource: str | None = None, test_resource: str | None = None, val_resource: str | None = None)[source]

Bases: BaseModel

Data specification for the workflow.

Variables:
  • type (str) – The type of data source (e.g., ‘csv’, ‘yaml’).

  • resource (str) – The path or URL to the data resource.

  • cat_entry (Optional[str]) – The catalog entry name if the resource is a YAML catalog.

  • target_cols (Union[str, list[str]]) – The target column(s) in the dataset.

  • input_col (str) – The input column in the dataset.

  • anvil_dir (Optional[str]) – The base directory for relative paths.

  • dropna (Optional[bool]) – Whether to drop rows with NaN values.

  • train_resource (Optional[str]) – The path or URL to the training data resource (if using separate train/test).

  • test_resource (Optional[str]) – The path or URL to the testing data resource (if using separate train/test).

  • val_resource (Optional[str]) – The path or URL to the validation data resource (if using separate train/test).

  • _catalog (Optional[intake.catalog.Catalog]) – The intake catalog object if the resource is a YAML file.

  • _using_train_test (bool) – Whether using separate train and test resources.

anvil_dir: str | None
cat_entry: str | None
property catalog

Get the intake catalog if the resource is a YAML file.

check_resource_test_train()[source]

Ensure that either resource or train/test/val resources are provided, not both.

dropna: bool | None
input_col: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

read() tuple[pandas.Series, pandas.Series][source]

Read the data from the resource.

Returns:

  • input (pd.Series) – The input data (e.g., SMILES strings)

  • targets (pd.Series) – The target data (e.g., properties to predict)

resource: str | None
target_cols: str | list[str]
template_anvil_dir(anvil_dir: Path)[source]

Template all resources with ANVIL_DIR if present.

template_resource()[source]

Template the resource with ANVIL_DIR if present.

Returns:

self – The DataSpec instance with the templated resource.

Return type:

DataSpec

test_resource: str | None
to_yaml(path, **storage_options)[source]

Write specification to YAML file.

Parameters:
  • path (str or PathLike) – The file path to write the YAML content to.

  • storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).

train_resource: str | None
type: str
property using_train_test

Whether using separate train and test resources.

val_resource: str | None
class openadmet.models.anvil.specification.EnsembleSpec(*, type: str | None = None, params: dict = {}, n_models: int, calibration_method: str | None = None, use_bagging: bool = False, param_paths: list[str] | None = None, serial_paths: list[str] | None = None)[source]

Bases: AnvilSection

Ensemble specification.

Variables:
  • section_name (ClassVar[str]) – The name of the section.

  • n_models (int) – The number of models in the ensemble.

  • calibration_method (str) – The calibration method to use.

  • param_paths (Optional[list[str]]) – The list of parameter file paths for the ensemble models.

  • serial_paths (Optional[list[str]]) – The list of serialization file paths for the ensemble models.

calibration_method: str | None
check_paths()[source]

Ensure both param_paths and serial_paths are provided together.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_models: int
param_paths: list[str] | None
section_name: ClassVar[str] = 'ensemble'
serial_paths: list[str] | None
template_anvil_dir(anvil_dir: Path)[source]

Template param_paths and serial_paths with ANVIL_DIR.

use_bagging: bool
class openadmet.models.anvil.specification.EvalSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Evaluation specification.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'eval'
class openadmet.models.anvil.specification.FeatureSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Featurization specification.

Variables:
  • section_name (ClassVar[str]) – The name of the section.

  • type (Optional[str]) – The type of featurizer to use.

  • params (dict) – The parameters for the featurizer.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'feat'
class openadmet.models.anvil.specification.Metadata(*, version: Literal['v1'], driver: str = 'sklearn', name: str, build_number: int, description: str, tag: str, authors: str, email: EmailStr, biotargets: list[str], tags: list[str])[source]

Bases: SpecBase

Metadata specification.

Variables:
  • version (Literal["v1"]) – The version of the metadata schema.

  • driver (str) – The driver for the workflow.

  • name (str) – The name of the workflow.

  • build_number (int) – The build number of the workflow (must be non-negative).

  • description (str) – Description of the workflow.

  • tag (str) – Primary tag for the workflow.

  • authors (str) – Name of the authors.

  • email (EmailStr) – Email address of the contact person.

  • biotargets (list[str]) – List of biotargets associated with the workflow.

  • tags (list[str]) – Additional tags for the workflow.

authors: str
biotargets: list[str]
build_number: int
description: str
driver: str
email: EmailStr
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
tag: str
tags: list[str]
version: Literal['v1']
class openadmet.models.anvil.specification.ModelSpec(*, type: str | None = None, params: dict = {}, param_path: str | None = None, serial_path: str | None = None, freeze_weights: dict | None = None)[source]

Bases: AnvilSection

Model specification.

Variables:
  • section_name (ClassVar[str]) – The name of the section.

  • param_path (Optional[str]) – The path to the model parameters file.

  • serial_path (Optional[str]) – The path to the model serialization file.

  • freeze_weights (Optional[dict]) – A dictionary specifying which layers to freeze during training.

check_freeze_weights()[source]

Ensure freeze weights is supplied for only applicable model types.

Returns:

self – The validated ModelSpec instance.

Return type:

ModelSpec

check_paths()[source]

Ensure both param_path and serial_path are provided together.

Returns:

self – The validated ModelSpec instance.

Return type:

ModelSpec

freeze_weights: dict | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

param_path: str | None
section_name: ClassVar[str] = 'model'
serial_path: str | None
template_anvil_dir(anvil_dir: Path)[source]

Template param_path and serial_path with ANVIL_DIR.

class openadmet.models.anvil.specification.ProcedureSpec(*, split: SplitSpec, feat: FeatureSpec, model: ModelSpec, ensemble: EnsembleSpec | None = None, train: TrainerSpec, transform: TransformSpec | None = None)[source]

Bases: SpecBase

Procedure specification.

ensemble: EnsembleSpec | None
feat: FeatureSpec
model: ModelSpec
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'procedure'
split: SplitSpec
template_anvil_dir(anvil_dir: Path)[source]

Template ANVIL_DIR in model and ensemble path fields.

train: TrainerSpec
transform: TransformSpec | None
class openadmet.models.anvil.specification.ReportSpec(*, eval: list[openadmet.models.anvil.specification.EvalSpec])[source]

Bases: SpecBase

Report specification.

eval: list[openadmet.models.anvil.specification.EvalSpec]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'report'
class openadmet.models.anvil.specification.SpecBase[source]

Bases: BaseModel

Base class for specifications.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_yaml(path, **storage_options)[source]

Write specification to YAML file.

Parameters:
  • path (str or PathLike) – The file path to write the YAML content to.

  • storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).

class openadmet.models.anvil.specification.SplitSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Data split specification.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'split'
class openadmet.models.anvil.specification.TrainerSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Trainer specification.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'train'
class openadmet.models.anvil.specification.TransformSpec(*, type: str | None = None, params: dict = {})[source]

Bases: AnvilSection

Transform specification.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

section_name: ClassVar[str] = 'transform'