Anvil Workflow Specifications
Specification models for Anvil workflows.
- class openadmet.models.anvil.specification.AnvilSection(*, type: str | None = None, params: dict = {})[source]
Bases:
SpecBaseAnvil specification section base class.
- Variables:
type (Optional[str]) – The type of the section.
params (dict) – The parameters for the section.
section_name (ClassVar[str]) – The name of the section.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- to_class()[source]
Convert the specification to the corresponding class instance.
- Returns:
instance – An instance of the class corresponding to the section type.
- Return type:
object
- class openadmet.models.anvil.specification.AnvilSpecification(*, metadata: Metadata, data: DataSpec, procedure: ProcedureSpec, report: ReportSpec)[source]
Bases:
BaseModelFull specification for Anvil workflow.
- data: DataSpec
- metadata: Metadata
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- procedure: ProcedureSpec
- report: ReportSpec
- run(output_dir: PathLike = 'anvil_training', debug: bool = False, tag: str = None)[source]
Run the Anvil workflow from this specification.
- to_multi_yaml(metadata_yaml='metadata.yaml', procedure_yaml='procedure.yaml', data_yaml='data.yaml', report_yaml='eval.yaml', **storage_options)[source]
Write specification to multiple YAML files.
- Parameters:
metadata_yaml (str or PathLike, optional) – The file path for the metadata YAML file. Default is ‘metadata.yaml’.
procedure_yaml (str or PathLike, optional) – The file path for the procedure YAML file. Default is ‘procedure.yaml’.
data_yaml (str or PathLike, optional) – The file path for the data YAML file. Default is ‘data.yaml’.
report_yaml (str or PathLike, optional) – The file path for the report YAML file. Default is ‘eval.yaml’.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS)
- to_recipe(path, **storage_options)[source]
Write specification to YAML recipe file.
- to_workflow()[source]
Convert the specification to a workflow object.
- class openadmet.models.anvil.specification.DataSpec(*, type: str, resource: str | None = None, cat_entry: str | None = None, target_cols: str | list[str], input_col: str, anvil_dir: str | None = None, dropna: bool | None = False, train_resource: str | None = None, test_resource: str | None = None, val_resource: str | None = None)[source]
Bases:
BaseModelData specification for the workflow.
- Variables:
type (str) – The type of data source (e.g., ‘csv’, ‘yaml’).
resource (str) – The path or URL to the data resource.
cat_entry (Optional[str]) – The catalog entry name if the resource is a YAML catalog.
target_cols (Union[str, list[str]]) – The target column(s) in the dataset.
input_col (str) – The input column in the dataset.
anvil_dir (Optional[str]) – The base directory for relative paths.
dropna (Optional[bool]) – Whether to drop rows with NaN values.
train_resource (Optional[str]) – The path or URL to the training data resource (if using separate train/test).
test_resource (Optional[str]) – The path or URL to the testing data resource (if using separate train/test).
val_resource (Optional[str]) – The path or URL to the validation data resource (if using separate train/test).
_catalog (Optional[intake.catalog.Catalog]) – The intake catalog object if the resource is a YAML file.
_using_train_test (bool) – Whether using separate train and test resources.
- check_resource_test_train()[source]
Ensure that either resource or train/test/val resources are provided, not both.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Args:
self: The BaseModel instance. context: The context.
- read() tuple[pandas.Series, pandas.Series][source]
Read the data from the resource.
- Returns:
input (pd.Series) – The input data (e.g., SMILES strings)
targets (pd.Series) – The target data (e.g., properties to predict)
- template_anvil_dir(anvil_dir: Path)[source]
Template all resources with ANVIL_DIR if present.
- template_resource()[source]
Template the resource with ANVIL_DIR if present.
- Returns:
self – The DataSpec instance with the templated resource.
- Return type:
- to_yaml(path, **storage_options)[source]
Write specification to YAML file.
- Parameters:
path (str or PathLike) – The file path to write the YAML content to.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).
- class openadmet.models.anvil.specification.EnsembleSpec(*, type: str | None = None, params: dict = {}, n_models: int, calibration_method: str | None = None, use_bagging: bool = False, param_paths: list[str] | None = None, serial_paths: list[str] | None = None)[source]
Bases:
AnvilSectionEnsemble specification.
- Variables:
section_name (ClassVar[str]) – The name of the section.
n_models (int) – The number of models in the ensemble.
calibration_method (str) – The calibration method to use.
param_paths (Optional[list[str]]) – The list of parameter file paths for the ensemble models.
serial_paths (Optional[list[str]]) – The list of serialization file paths for the ensemble models.
- check_paths()[source]
Ensure both param_paths and serial_paths are provided together.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- template_anvil_dir(anvil_dir: Path)[source]
Template param_paths and serial_paths with ANVIL_DIR.
- class openadmet.models.anvil.specification.EvalSpec(*, type: str | None = None, params: dict = {})[source]
Bases:
AnvilSectionEvaluation specification.
- class openadmet.models.anvil.specification.FeatureSpec(*, type: str | None = None, params: dict = {})[source]
Bases:
AnvilSectionFeaturization specification.
- Variables:
section_name (ClassVar[str]) – The name of the section.
type (Optional[str]) – The type of featurizer to use.
params (dict) – The parameters for the featurizer.
- class openadmet.models.anvil.specification.Metadata(*, version: Literal['v1'], driver: str = 'sklearn', name: str, build_number: int, description: str, tag: str, authors: str, email: EmailStr, biotargets: list[str], tags: list[str])[source]
Bases:
SpecBaseMetadata specification.
- Variables:
version (Literal["v1"]) – The version of the metadata schema.
driver (str) – The driver for the workflow.
name (str) – The name of the workflow.
build_number (int) – The build number of the workflow (must be non-negative).
description (str) – Description of the workflow.
tag (str) – Primary tag for the workflow.
authors (str) – Name of the authors.
email (EmailStr) – Email address of the contact person.
biotargets (list[str]) – List of biotargets associated with the workflow.
tags (list[str]) – Additional tags for the workflow.
- class openadmet.models.anvil.specification.ModelSpec(*, type: str | None = None, params: dict = {}, param_path: str | None = None, serial_path: str | None = None, freeze_weights: dict | None = None)[source]
Bases:
AnvilSectionModel specification.
- Variables:
section_name (ClassVar[str]) – The name of the section.
param_path (Optional[str]) – The path to the model parameters file.
serial_path (Optional[str]) – The path to the model serialization file.
freeze_weights (Optional[dict]) – A dictionary specifying which layers to freeze during training.
- check_freeze_weights()[source]
Ensure freeze weights is supplied for only applicable model types.
- Returns:
self – The validated ModelSpec instance.
- Return type:
- check_paths()[source]
Ensure both param_path and serial_path are provided together.
- Returns:
self – The validated ModelSpec instance.
- Return type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- template_anvil_dir(anvil_dir: Path)[source]
Template param_path and serial_path with ANVIL_DIR.
- class openadmet.models.anvil.specification.ProcedureSpec(*, split: SplitSpec, feat: FeatureSpec, model: ModelSpec, ensemble: EnsembleSpec | None = None, train: TrainerSpec, transform: TransformSpec | None = None)[source]
Bases:
SpecBaseProcedure specification.
- ensemble: EnsembleSpec | None
- feat: FeatureSpec
- model: ModelSpec
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- split: SplitSpec
- template_anvil_dir(anvil_dir: Path)[source]
Template ANVIL_DIR in model and ensemble path fields.
- train: TrainerSpec
- transform: TransformSpec | None
- class openadmet.models.anvil.specification.ReportSpec(*, eval: list[openadmet.models.anvil.specification.EvalSpec])[source]
Bases:
SpecBaseReport specification.
- eval: list[openadmet.models.anvil.specification.EvalSpec]
- class openadmet.models.anvil.specification.SpecBase[source]
Bases:
BaseModelBase class for specifications.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- to_yaml(path, **storage_options)[source]
Write specification to YAML file.
- Parameters:
path (str or PathLike) – The file path to write the YAML content to.
storage_options (dict, optional) – Additional options to pass to the file system (e.g., for S3, GCS).
- class openadmet.models.anvil.specification.SplitSpec(*, type: str | None = None, params: dict = {})[source]
Bases:
AnvilSectionData split specification.
- class openadmet.models.anvil.specification.TrainerSpec(*, type: str | None = None, params: dict = {})[source]
Bases:
AnvilSectionTrainer specification.
- class openadmet.models.anvil.specification.TransformSpec(*, type: str | None = None, params: dict = {})[source]
Bases:
AnvilSectionTransform specification.