protomotions.agents.evaluators.base_evaluator module#

Base evaluator for agent evaluation and metrics computation.

This module provides the base evaluation infrastructure for computing performance metrics during training and evaluation. Evaluators run periodic assessments of agent performance and compute task-specific metrics.

Key Classes:

BaseEvaluator: Base class for all evaluators with hook-based customization

Key Features:

Periodic evaluation during training
Hook pattern for subclass customization (4 hooks: start, reset_kwargs, check, step)
MdpComponent-based evaluation with threshold failure detection
Aggregate metrics via plugin system (see aggregate_metrics.py)
Episode statistics aggregation
Distributed evaluation support

Note

Aggregate metric plugins (SmoothnessAggregateMetric, ActionSmoothnessAggregateMetric) are defined in aggregate_metrics.py and compute post-hoc statistics over accumulated MotionMetrics trajectories.

class protomotions.agents.evaluators.base_evaluator.BaseEvaluator(agent, fabric, config)[source]#

Bases: object

Base class for agent evaluation and metrics computation.

Runs periodic evaluations during training to assess agent performance. Collects episode statistics, computes task-specific metrics, and provides feedback for checkpoint selection (best model saving).

Parameters:

agent (Any) – The agent being evaluated.
fabric (MockFabric) – Lightning Fabric instance for distributed evaluation.
config (EvaluatorConfig) – Evaluator configuration specifying eval frequency and length.

Example

>>> evaluator = BaseEvaluator(agent, fabric, config)
>>> metrics, score, num_items = evaluator.evaluate()

__init__(agent, fabric, config)[source]#

Initialize the evaluator.

Parameters:

agent (Any) – The agent to evaluate
fabric (MockFabric) – Lightning Fabric instance for distributed training

property device: <Mock object at 0x7fd695074c50>[]#: Device for computations (from fabric).

property env: BaseEnv#: Environment instance (from agent).

property root_dir#: Root directory for saving outputs (from agent).

property num_envs: int#: Number of environments (from agent).

property max_eval_steps: int#: Maximum steps per evaluation episode.

initialize_eval()[source]#

Initialize evaluation tracking.

run_evaluation()[source]#

Run the evaluation process.

evaluate_episode(env_ids, max_steps)[source]#

Run a single episode batch.

Subclasses customize behavior via 4 hooks: - _on_episode_start: pre-reset setup - _get_reset_kwargs: customize env.reset() call - _check_eval_components: per-step evaluation component checking - _on_episode_step: per-step data collection

Parameters:

env_ids (MockTensor) – Environment IDs to evaluate [num_envs]
max_steps (int) – Maximum steps for this episode

process_eval_results()[source]#

Process collected metrics and prepare for logging.

Returns:

Dict of evaluation metrics for logging
Optional score value for determining best model
Number of items evaluated (for weighted cross-rank aggregation)

Return type:

Tuple containing

get_state_dict()[source]#

Get evaluator state for checkpointing. Override in subclasses.

load_state_dict(state_dict)[source]#

Load evaluator state from checkpoint. Override in subclasses.

cleanup_after_evaluation()[source]#

Clean up after evaluation.

simple_test_policy(collect_metrics=False)[source]#

Simple evaluation loop for interactive testing.

Runs policy indefinitely, collecting running average of metrics. Press Ctrl+C to stop and print summary.

Parameters:: collect_metrics (bool) – If True, collect and print average metrics on exit.