protomotions.agents.base_agent.config module#

Configuration classes for base agent.

This module defines the configuration dataclasses used by the base agent and all derived agents. These configurations specify training parameters, optimization settings, and evaluation parameters.

Key Classes:
  • BaseAgentConfig: Main agent configuration

  • BaseModelConfig: Model architecture configuration

  • OptimizerConfig: Optimizer parameters

  • MaxEpisodeLengthManagerConfig: Episode length curriculum

class protomotions.agents.base_agent.config.MaxEpisodeLengthManagerConfig(
start_length=5,
end_length=300,
transition_epochs=100000,
)[source]#

Bases: object

Configuration for managing max episode length during training.

Attributes:

start_length: Initial max episode length. end_length: Final max episode length. transition_epochs: Epochs to transition.

start_length: int = 5#
end_length: int = 300#
transition_epochs: int = 100000#
current_max_episode_length(
current_epoch,
)[source]#

Returns the current max episode length based on linear interpolation.

Parameters:

current_step – Current step in the episode

Returns:

Interpolated max episode length

Return type:

int

__init__(
start_length=5,
end_length=300,
transition_epochs=100000,
)#
class protomotions.agents.base_agent.config.OptimizerConfig(
_target_='torch.optim.Adam',
lr=0.0001,
weight_decay=0.0,
eps=1e-08,
betas=<factory>,
)[source]#

Bases: object

Configuration for optimizers.

Attributes:

lr: Learning rate. weight_decay: L2 weight decay. eps: Epsilon for numerical stability. betas: Adam betas.

lr: float = 0.0001#
weight_decay: float = 0.0#
eps: float = 1e-08#
betas: tuple#
__init__(
_target_='torch.optim.Adam',
lr=0.0001,
weight_decay=0.0,
eps=1e-08,
betas=<factory>,
)#
class protomotions.agents.base_agent.config.BaseModelConfig(
_target_='protomotions.agents.base_agent.model.BaseModel',
in_keys=<factory>,
out_keys=<factory>,
)[source]#

Bases: object

Configuration for PPO Model (Actor-Critic).

Attributes:

in_keys: Input keys. out_keys: Output keys.

in_keys: List[str]#
out_keys: List[str]#
__init__(
_target_='protomotions.agents.base_agent.model.BaseModel',
in_keys=<factory>,
out_keys=<factory>,
)#
class protomotions.agents.base_agent.config.BaseAgentConfig(
batch_size,
training_max_steps,
_target_='protomotions.agents.base_agent.agent.BaseAgent',
model=<factory>,
num_steps=32,
gradient_clip_val=0.0,
fail_on_bad_grads=False,
check_grad_mag=True,
gamma=0.99,
bounds_loss_coef=0.0,
task_reward_w=1.0,
num_mini_epochs=1,
training_early_termination=None,
save_epoch_checkpoint_every=1000,
save_last_checkpoint_every=10,
max_episode_length_manager=None,
evaluator=<factory>,
normalize_rewards=True,
normalized_reward_clamp_value=5.0,
)[source]#

Bases: object

Main configuration class for PPO Agent.

Attributes:

batch_size: Training batch size. training_max_steps: Maximum training steps. model: Model config. num_steps: Environment steps per update. gradient_clip_val: Max gradient norm. 0=disabled. fail_on_bad_grads: Fail on NaN/Inf gradients. check_grad_mag: Log gradient magnitude. gamma: Discount factor. bounds_loss_coef: Action bounds loss. 0 for tanh outputs. task_reward_w: Task reward weight. num_mini_epochs: Mini-epochs per update. training_early_termination: Stop early at this step. None=disabled. save_epoch_checkpoint_every: Save epoch_xxx.ckpt every N epochs. save_last_checkpoint_every: Save last.ckpt every K epochs. max_episode_length_manager: Episode length curriculum. evaluator: Evaluation config. normalize_rewards: Normalize rewards. normalized_reward_clamp_value: Clamp normalized rewards to [-val, val].

batch_size: int#
training_max_steps: int#
model: BaseModelConfig#
num_steps: int = 32#
gradient_clip_val: float = 0.0#
fail_on_bad_grads: bool = False#
check_grad_mag: bool = True#
gamma: float = 0.99#
bounds_loss_coef: float = 0.0#
task_reward_w: float = 1.0#
num_mini_epochs: int = 1#
training_early_termination: int | None = None#
save_epoch_checkpoint_every: int | None = 1000#
save_last_checkpoint_every: int = 10#
max_episode_length_manager: MaxEpisodeLengthManagerConfig | None = None#
evaluator: EvaluatorConfig#
normalize_rewards: bool = True#
normalized_reward_clamp_value: float = 5.0#
__init__(
batch_size,
training_max_steps,
_target_='protomotions.agents.base_agent.agent.BaseAgent',
model=<factory>,
num_steps=32,
gradient_clip_val=0.0,
fail_on_bad_grads=False,
check_grad_mag=True,
gamma=0.99,
bounds_loss_coef=0.0,
task_reward_w=1.0,
num_mini_epochs=1,
training_early_termination=None,
save_epoch_checkpoint_every=1000,
save_last_checkpoint_every=10,
max_episode_length_manager=None,
evaluator=<factory>,
normalize_rewards=True,
normalized_reward_clamp_value=5.0,
)#