protomotions.agents.amp.config module#
Configuration classes for AMP (Adversarial Motion Priors) agent.
This module defines configurations for the AMP algorithm which uses a discriminator to learn motion priors from reference motions.
- class protomotions.agents.amp.config.AMPParametersConfig(
- conditional_discriminator=False,
- discriminator_reward_w=1.0,
- discriminator_weight_decay=0.0001,
- discriminator_logit_weight_decay=0.01,
- discriminator_batch_size=4096,
- discriminator_grad_penalty=5.0,
- discriminator_optimization_ratio=1,
- discriminator_replay_keep_prob=0.01,
- discriminator_replay_size=200000,
- discriminator_reward_threshold=0.05,
- discriminator_max_cumulative_bad_transitions=10,
Bases:
objectConfiguration for AMP-specific hyperparameters.
- Attributes:
conditional_discriminator: Whether to use conditional discriminator based on motion state. discriminator_reward_w: Weight for discriminator reward in total reward. discriminator_weight_decay: L2 weight decay for discriminator parameters. discriminator_logit_weight_decay: Weight decay specifically for discriminator logit layer. discriminator_batch_size: Batch size for discriminator training. discriminator_grad_penalty: Gradient penalty coefficient for discriminator stability. discriminator_optimization_ratio: Ratio of discriminator updates to policy updates. discriminator_replay_keep_prob: Probability to keep samples in replay buffer. discriminator_replay_size: Maximum size of discriminator replay buffer. discriminator_reward_threshold: Threshold for discriminator reward termination. discriminator_max_cumulative_bad_transitions: Max bad transitions before termination.
- __init__(
- conditional_discriminator=False,
- discriminator_reward_w=1.0,
- discriminator_weight_decay=0.0001,
- discriminator_logit_weight_decay=0.01,
- discriminator_batch_size=4096,
- discriminator_grad_penalty=5.0,
- discriminator_optimization_ratio=1,
- discriminator_replay_keep_prob=0.01,
- discriminator_replay_size=200000,
- discriminator_reward_threshold=0.05,
- discriminator_max_cumulative_bad_transitions=10,
- class protomotions.agents.amp.config.DiscriminatorConfig(
- models=<factory>,
- _target_='protomotions.agents.amp.model.Discriminator',
- in_keys=<factory>,
- out_keys=<factory>,
Bases:
ModuleContainerConfigConfiguration for AMP Discriminator network.
- Attributes:
models: List of module configurations to execute sequentially. in_keys: Input tensor keys required by this container. out_keys: Output key for discriminator logits.
- __init__(
- models=<factory>,
- _target_='protomotions.agents.amp.model.Discriminator',
- in_keys=<factory>,
- out_keys=<factory>,
- class protomotions.agents.amp.config.AMPModelConfig(
- _target_='protomotions.agents.amp.model.AMPModel',
- in_keys=<factory>,
- out_keys=<factory>,
- actor=<factory>,
- critic=<factory>,
- actor_optimizer=<factory>,
- critic_optimizer=<factory>,
- discriminator=<factory>,
- discriminator_optimizer=<factory>,
- disc_critic=<factory>,
- disc_critic_optimizer=<factory>,
Bases:
PPOModelConfigConfiguration for AMP Model (Actor-Critic with Discriminator).
- Attributes:
in_keys: Input keys. out_keys: Output keys including actions and value estimate. actor: Actor (policy) network configuration. critic: Critic (value) network configuration. actor_optimizer: Optimizer settings for actor network. critic_optimizer: Optimizer settings for critic network. discriminator: Discriminator network for motion prior learning. discriminator_optimizer: Optimizer settings for discriminator. disc_critic: Critic network for discriminator reward. disc_critic_optimizer: Optimizer settings for discriminator critic.
- discriminator: DiscriminatorConfig#
- discriminator_optimizer: OptimizerConfig#
- disc_critic: ModuleContainerConfig#
- disc_critic_optimizer: OptimizerConfig#
- __init__(
- _target_='protomotions.agents.amp.model.AMPModel',
- in_keys=<factory>,
- out_keys=<factory>,
- actor=<factory>,
- critic=<factory>,
- actor_optimizer=<factory>,
- critic_optimizer=<factory>,
- discriminator=<factory>,
- discriminator_optimizer=<factory>,
- disc_critic=<factory>,
- disc_critic_optimizer=<factory>,
- class protomotions.agents.amp.config.AMPAgentConfig(
- batch_size,
- training_max_steps,
- _target_='protomotions.agents.amp.agent.AMP',
- model=<factory>,
- num_steps=32,
- gradient_clip_val=0.0,
- fail_on_bad_grads=False,
- check_grad_mag=True,
- gamma=0.99,
- bounds_loss_coef=0.0,
- task_reward_w=1.0,
- num_mini_epochs=1,
- training_early_termination=None,
- save_epoch_checkpoint_every=1000,
- save_last_checkpoint_every=10,
- max_episode_length_manager=None,
- evaluator=<factory>,
- normalize_rewards=True,
- normalized_reward_clamp_value=5.0,
- tau=0.95,
- e_clip=0.2,
- clip_critic_loss=True,
- actor_clip_frac_threshold=0.6,
- advantage_normalization=<factory>,
- amp_parameters=<factory>,
- reference_obs_components=<factory>,
Bases:
PPOAgentConfigMain configuration class for AMP Agent.
- Attributes:
batch_size: Training batch size. training_max_steps: Maximum training steps. model: AMP model configuration including discriminator. num_steps: Environment steps per update. gradient_clip_val: Max gradient norm. 0=disabled. fail_on_bad_grads: Fail on NaN/Inf gradients. check_grad_mag: Log gradient magnitude. gamma: Discount factor. bounds_loss_coef: Action bounds loss. 0 for tanh outputs. task_reward_w: Task reward weight. num_mini_epochs: Mini-epochs per update. training_early_termination: Stop early at this step. None=disabled. save_epoch_checkpoint_every: Save epoch_xxx.ckpt every N epochs. save_last_checkpoint_every: Save last.ckpt every K epochs. max_episode_length_manager: Episode length curriculum. evaluator: Evaluation config. normalize_rewards: Normalize rewards. normalized_reward_clamp_value: Clamp normalized rewards to [-val, val]. tau: GAE lambda for advantage estimation. e_clip: PPO clipping parameter epsilon. clip_critic_loss: Clip critic loss similar to actor. actor_clip_frac_threshold: Skip actor update if clip_frac > threshold (e.g., 0.5). advantage_normalization: Advantage normalization settings. amp_parameters: AMP-specific training parameters. reference_obs_components: Observation components for computing reference motion features.
- model: AMPModelConfig#
- amp_parameters: AMPParametersConfig#
- __init__(
- batch_size,
- training_max_steps,
- _target_='protomotions.agents.amp.agent.AMP',
- model=<factory>,
- num_steps=32,
- gradient_clip_val=0.0,
- fail_on_bad_grads=False,
- check_grad_mag=True,
- gamma=0.99,
- bounds_loss_coef=0.0,
- task_reward_w=1.0,
- num_mini_epochs=1,
- training_early_termination=None,
- save_epoch_checkpoint_every=1000,
- save_last_checkpoint_every=10,
- max_episode_length_manager=None,
- evaluator=<factory>,
- normalize_rewards=True,
- normalized_reward_clamp_value=5.0,
- tau=0.95,
- e_clip=0.2,
- clip_critic_loss=True,
- actor_clip_frac_threshold=0.6,
- advantage_normalization=<factory>,
- amp_parameters=<factory>,
- reference_obs_components=<factory>,