protomotions.agents.evaluators.base_evaluator module#
Base evaluator for agent evaluation and metrics computation.
This module provides the base evaluation infrastructure for computing performance metrics during training and evaluation. Evaluators run periodic assessments of agent performance and compute task-specific metrics.
- Key Classes:
BaseEvaluator: Base class for all evaluators with hook-based customization
- Key Features:
Periodic evaluation during training
Hook pattern for subclass customization (4 hooks: start, reset_kwargs, check, step)
MdpComponent-based evaluation with threshold failure detection
Aggregate metrics via plugin system (see aggregate_metrics.py)
Episode statistics aggregation
Distributed evaluation support
Note
Aggregate metric plugins (SmoothnessAggregateMetric, ActionSmoothnessAggregateMetric) are defined in aggregate_metrics.py and compute post-hoc statistics over accumulated MotionMetrics trajectories.
- class protomotions.agents.evaluators.base_evaluator.BaseEvaluator(agent, fabric, config)[source]#
Bases:
objectBase class for agent evaluation and metrics computation.
Runs periodic evaluations during training to assess agent performance. Collects episode statistics, computes task-specific metrics, and provides feedback for checkpoint selection (best model saving).
- Parameters:
agent (Any) – The agent being evaluated.
fabric (MockFabric) – Lightning Fabric instance for distributed evaluation.
config (EvaluatorConfig) – Evaluator configuration specifying eval frequency and length.
Example
>>> evaluator = BaseEvaluator(agent, fabric, config) >>> metrics, score = evaluator.evaluate()
- __init__(agent, fabric, config)[source]#
Initialize the evaluator.
- Parameters:
agent (Any) – The agent to evaluate
fabric (MockFabric) – Lightning Fabric instance for distributed training
- property device: <Mock object at 0x7faa7317dd10>[]#
Device for computations (from fabric).
- property env: BaseEnv#
Environment instance (from agent).
- property root_dir#
Root directory for saving outputs (from agent).
- evaluate_episode(env_ids, max_steps)[source]#
Run a single episode batch.
Subclasses customize behavior via 4 hooks: - _on_episode_start: pre-reset setup - _get_reset_kwargs: customize env.reset() call - _check_eval_components: per-step evaluation component checking - _on_episode_step: per-step data collection
- Parameters:
env_ids (MockTensor) – Environment IDs to evaluate [num_envs]
max_steps (int) – Maximum steps for this episode