Evaluation Base Classes

Base class and utilities for evaluation modules.

class openadmet.models.eval.eval_base.EvalBase(*, n_resamples: int = 9999, **extra_data: Any)[source]

Bases: BaseModel

Abstract base class for evaluation modules.

Variables:: n_resamples (int) – Number of bootstrap resamples used to estimate confidence intervals. Defaults to 9999 (scipy default). Lower values (e.g. 100) are appropriate for unit tests where CI precision is not required.

class Config[source]

Bases: object

Pydantic configuration for the EvalBase class.

extra = 'allow'

abstract evaluate(y_true=None, y_pred=None, model=None, X_train=None, y_train=None, wandb_logger=None)[source]

Evaluate the model.

Parameters:

y_true (array-like, optional) – True values.
y_pred (array-like, optional) – Predicted values.
model (object, optional) – Model instance.
X_train (array-like, optional) – Training features.
y_train (array-like, optional) – Training targets.
wandb_logger (object, optional) – Weights & Biases logger.

Returns:

Evaluation results.

Return type:

Any

is_cross_val: ClassVar[bool] = False

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_resamples: int

abstract report()[source]

Report the evaluation results.

Returns:: Report output.
Return type:: Any

stat_and_bootstrap(metric_tag: str, y_pred: ndarray, y_true: ndarray, statistic: Callable, confidence_level: float = 0.95, is_scipy_statistic: bool = False)[source]

Calculate a metric and its bootstrap confidence interval.

Parameters:

metric_tag (str) – Name of the metric.
y_pred (np.ndarray) – Predicted values.
y_true (np.ndarray) – True values.
statistic (Callable) – Function to compute the metric.
confidence_level (float, optional) – Confidence level for the interval (default is 0.95).
is_scipy_statistic (bool, optional) – Whether the statistic is a scipy.stats object (default is False).

Returns:

Tuple of (metric, lower confidence bound, upper confidence bound).

Return type:

tuple

openadmet.models.eval.eval_base.get_eval_class(eval_type)[source]

Retrieve an evaluation class from the registry by type.

Parameters:: eval_type (str) – The evaluation type string.
Returns:: The evaluation class corresponding to the given type.
Return type:: type
Raises:: ValueError – If the evaluation type is not found in the registry.

openadmet.models.eval.eval_base.get_t_true_and_t_pred(task_id, y_true, y_pred, y_val=None, y_pred_fold=None)[source]

Get true and predicted values for each task, handling pairwise differences if necessary.

Parameters:

task_id (int) – ID of the task.
y_true (array-like) – True values for the full dataset.
y_val (array-like) – True values for the validation set.
y_pred (array-like) – Predicted values for the full dataset.
y_pred_fold (array-like) – Predicted values for the current fold.

Returns:

List of (t_true, t_pred) tuples for each task.

Return type:

list of tuples

openadmet.models.eval.eval_base.mask_nans(y_true: ndarray, y_pred: ndarray)[source]

Remove any pairs where either y_true or y_pred is NaN.

Parameters:

y_true (np.ndarray) – Array of true values.
y_pred (np.ndarray) – Array of predicted values.

Returns:

Filtered arrays (y_true, y_pred) with NaNs removed.

Return type:

tuple of np.ndarray

openadmet.models.eval.eval_base.mask_nans_std(y_true: ndarray, y_pred: ndarray, y_std: ndarray)[source]

Remove any pairs where either y_true or y_pred is NaN.

Parameters:

y_true (np.ndarray) – Array of true values.
y_pred (np.ndarray) – Array of predicted values.
y_std (np.ndarray) – Array of standard deviations.

Returns:

Filtered arrays (y_true, y_pred, y_std) with NaNs removed.

Return type:

tuple of np.ndarray