protomotions.agents.base_agent.config module#
Configuration classes for base agent.
This module defines the configuration dataclasses used by the base agent and all derived agents. These configurations specify training parameters, optimization settings, and evaluation parameters.
- Key Classes:
BaseAgentConfig: Main agent configuration
BaseModelConfig: Model architecture configuration
OptimizerConfig: Optimizer parameters
MaxEpisodeLengthManagerConfig: Episode length curriculum
- class protomotions.agents.base_agent.config.MaxEpisodeLengthManagerConfig(
- start_length=5,
- end_length=300,
- transition_epochs=100000,
Bases:
objectConfiguration for managing max episode length during training.
- Attributes:
start_length: Initial max episode length. end_length: Final max episode length. transition_epochs: Epochs to transition.
- current_max_episode_length(
- current_epoch,
Returns the current max episode length based on linear interpolation.
- Parameters:
current_step – Current step in the episode
- Returns:
Interpolated max episode length
- Return type:
- __init__(
- start_length=5,
- end_length=300,
- transition_epochs=100000,
- class protomotions.agents.base_agent.config.OptimizerConfig(
- _target_='torch.optim.Adam',
- lr=0.0001,
- weight_decay=0.0,
- eps=1e-08,
- betas=<factory>,
Bases:
objectConfiguration for optimizers.
- Attributes:
lr: Learning rate. weight_decay: L2 weight decay. eps: Epsilon for numerical stability. betas: Adam betas.
- __init__(
- _target_='torch.optim.Adam',
- lr=0.0001,
- weight_decay=0.0,
- eps=1e-08,
- betas=<factory>,
- class protomotions.agents.base_agent.config.BaseModelConfig(
- _target_='protomotions.agents.base_agent.model.BaseModel',
- in_keys=<factory>,
- out_keys=<factory>,
Bases:
objectConfiguration for PPO Model (Actor-Critic).
- Attributes:
in_keys: Input keys. out_keys: Output keys.
- __init__(
- _target_='protomotions.agents.base_agent.model.BaseModel',
- in_keys=<factory>,
- out_keys=<factory>,
- class protomotions.agents.base_agent.config.BaseAgentConfig(
- batch_size,
- training_max_steps,
- _target_='protomotions.agents.base_agent.agent.BaseAgent',
- model=<factory>,
- num_steps=32,
- gradient_clip_val=0.0,
- fail_on_bad_grads=False,
- check_grad_mag=True,
- gamma=0.99,
- bounds_loss_coef=0.0,
- task_reward_w=1.0,
- num_mini_epochs=1,
- training_early_termination=None,
- save_epoch_checkpoint_every=1000,
- save_last_checkpoint_every=10,
- max_episode_length_manager=None,
- evaluator=<factory>,
- normalize_rewards=True,
- normalized_reward_clamp_value=5.0,
Bases:
objectMain configuration class for PPO Agent.
- Attributes:
batch_size: Training batch size. training_max_steps: Maximum training steps. model: Model config. num_steps: Environment steps per update. gradient_clip_val: Max gradient norm. 0=disabled. fail_on_bad_grads: Fail on NaN/Inf gradients. check_grad_mag: Log gradient magnitude. gamma: Discount factor. bounds_loss_coef: Action bounds loss. 0 for tanh outputs. task_reward_w: Task reward weight. num_mini_epochs: Mini-epochs per update. training_early_termination: Stop early at this step. None=disabled. save_epoch_checkpoint_every: Save epoch_xxx.ckpt every N epochs. save_last_checkpoint_every: Save last.ckpt every K epochs. max_episode_length_manager: Episode length curriculum. evaluator: Evaluation config. normalize_rewards: Normalize rewards. normalized_reward_clamp_value: Clamp normalized rewards to [-val, val].
- model: BaseModelConfig#
- max_episode_length_manager: MaxEpisodeLengthManagerConfig | None = None#
- evaluator: EvaluatorConfig#
- __init__(
- batch_size,
- training_max_steps,
- _target_='protomotions.agents.base_agent.agent.BaseAgent',
- model=<factory>,
- num_steps=32,
- gradient_clip_val=0.0,
- fail_on_bad_grads=False,
- check_grad_mag=True,
- gamma=0.99,
- bounds_loss_coef=0.0,
- task_reward_w=1.0,
- num_mini_epochs=1,
- training_early_termination=None,
- save_epoch_checkpoint_every=1000,
- save_last_checkpoint_every=10,
- max_episode_length_manager=None,
- evaluator=<factory>,
- normalize_rewards=True,
- normalized_reward_clamp_value=5.0,