protomotions.agents.ase.config module#

Configuration classes for ASE (Adversarial Skill Embeddings) agent.

ASE extends AMP with a learned latent skill space that enables diverse behaviors and mutual information maximization between latent codes and behaviors.

class protomotions.agents.ase.config.ASEParametersConfig(
latent_dim=64,
latent_steps_min=1,
latent_steps_max=150,
mi_reward_w=0.5,
mi_hypersphere_reward_shift=True,
mi_enc_weight_decay=0.0,
mi_enc_grad_penalty=0.0,
diversity_bonus=0.01,
diversity_tar=1.0,
latent_uniformity_weight=0.1,
uniformity_kernel_scale=1.0,
)[source]#

Bases: object

Configuration for ASE-specific hyperparameters.

Attributes:

latent_dim: Dimension of the latent skill space. latent_steps_min: Minimum steps before resampling latent. latent_steps_max: Maximum steps before resampling latent. mi_reward_w: Weight for mutual information reward. mi_hypersphere_reward_shift: Shift MI reward to encourage hypersphere projections. mi_enc_weight_decay: Weight decay for MI encoder parameters. mi_enc_grad_penalty: Gradient penalty for MI encoder. diversity_bonus: Bonus reward for behavior diversity. diversity_tar: Target diversity level. latent_uniformity_weight: Weight for latent uniformity loss. uniformity_kernel_scale: Scale for uniformity kernel.

latent_dim: int = 64#
latent_steps_min: int = 1#
latent_steps_max: int = 150#
mi_reward_w: float = 0.5#
mi_hypersphere_reward_shift: bool = True#
mi_enc_weight_decay: float = 0.0#
mi_enc_grad_penalty: float = 0.0#
diversity_bonus: float = 0.01#
diversity_tar: float = 1.0#
latent_uniformity_weight: float = 0.1#
uniformity_kernel_scale: float = 1.0#
__init__(
latent_dim=64,
latent_steps_min=1,
latent_steps_max=150,
mi_reward_w=0.5,
mi_hypersphere_reward_shift=True,
mi_enc_weight_decay=0.0,
mi_enc_grad_penalty=0.0,
diversity_bonus=0.01,
diversity_tar=1.0,
latent_uniformity_weight=0.1,
uniformity_kernel_scale=1.0,
)#
class protomotions.agents.ase.config.ASEDiscriminatorEncoderConfig(
models=<factory>,
_target_='protomotions.agents.ase.model.ASEDiscriminatorEncoder',
in_keys=<factory>,
out_keys=<factory>,
encoder_out_size=None,
)[source]#

Bases: DiscriminatorConfig

Configuration for ASE Discriminator-Encoder network (extends ModuleContainerConfig).

Attributes:

models: List of module configurations to execute sequentially. in_keys: Input observation keys. out_keys: Output keys for discriminator logits and MI encoder output. encoder_out_size: Output size for encoder. Should match latent_dim.

encoder_out_size: int = None#
in_keys: List[str]#
out_keys: List[str]#
__init__(
models=<factory>,
_target_='protomotions.agents.ase.model.ASEDiscriminatorEncoder',
in_keys=<factory>,
out_keys=<factory>,
encoder_out_size=None,
)#
class protomotions.agents.ase.config.ASEModelConfig(
_target_='protomotions.agents.ase.model.ASEModel',
in_keys=<factory>,
out_keys=<factory>,
actor=<factory>,
critic=<factory>,
actor_optimizer=<factory>,
critic_optimizer=<factory>,
discriminator=<factory>,
discriminator_optimizer=<factory>,
disc_critic=<factory>,
disc_critic_optimizer=<factory>,
mi_critic=<factory>,
mi_critic_optimizer=<factory>,
)[source]#

Bases: AMPModelConfig

Configuration for ASE model with MI critic.

Attributes:

in_keys: Input keys. out_keys: Output keys including actions and value estimate. actor: Actor (policy) network configuration. critic: Critic (value) network configuration. actor_optimizer: Optimizer settings for actor network. critic_optimizer: Optimizer settings for critic network. discriminator: Discriminator network for motion prior learning. discriminator_optimizer: Optimizer settings for discriminator. disc_critic: Critic network for discriminator reward. disc_critic_optimizer: Optimizer settings for discriminator critic. mi_critic: Critic network for mutual information reward. mi_critic_optimizer: Optimizer settings for MI critic.

mi_critic: ModuleContainerConfig#
mi_critic_optimizer: OptimizerConfig#
__init__(
_target_='protomotions.agents.ase.model.ASEModel',
in_keys=<factory>,
out_keys=<factory>,
actor=<factory>,
critic=<factory>,
actor_optimizer=<factory>,
critic_optimizer=<factory>,
discriminator=<factory>,
discriminator_optimizer=<factory>,
disc_critic=<factory>,
disc_critic_optimizer=<factory>,
mi_critic=<factory>,
mi_critic_optimizer=<factory>,
)#
class protomotions.agents.ase.config.ASEAgentConfig(
batch_size,
training_max_steps,
_target_='protomotions.agents.ase.agent.ASE',
model=<factory>,
num_steps=32,
gradient_clip_val=0.0,
fail_on_bad_grads=False,
check_grad_mag=True,
gamma=0.99,
bounds_loss_coef=0.0,
task_reward_w=1.0,
num_mini_epochs=1,
training_early_termination=None,
save_epoch_checkpoint_every=1000,
save_last_checkpoint_every=10,
max_episode_length_manager=None,
evaluator=<factory>,
normalize_rewards=True,
normalized_reward_clamp_value=5.0,
tau=0.95,
e_clip=0.2,
clip_critic_loss=True,
actor_clip_frac_threshold=0.6,
advantage_normalization=<factory>,
amp_parameters=<factory>,
reference_obs_components=<factory>,
ase_parameters=<factory>,
)[source]#

Bases: AMPAgentConfig

Main configuration class for ASE Agent.

Attributes:

batch_size: Training batch size. training_max_steps: Maximum training steps. model: AMP model configuration including discriminator. num_steps: Environment steps per update. gradient_clip_val: Max gradient norm. 0=disabled. fail_on_bad_grads: Fail on NaN/Inf gradients. check_grad_mag: Log gradient magnitude. gamma: Discount factor. bounds_loss_coef: Action bounds loss. 0 for tanh outputs. task_reward_w: Task reward weight. num_mini_epochs: Mini-epochs per update. training_early_termination: Stop early at this step. None=disabled. save_epoch_checkpoint_every: Save epoch_xxx.ckpt every N epochs. save_last_checkpoint_every: Save last.ckpt every K epochs. max_episode_length_manager: Episode length curriculum. evaluator: Evaluation config. normalize_rewards: Normalize rewards. normalized_reward_clamp_value: Clamp normalized rewards to [-val, val]. tau: GAE lambda for advantage estimation. e_clip: PPO clipping parameter epsilon. clip_critic_loss: Clip critic loss similar to actor. actor_clip_frac_threshold: Skip actor update if clip_frac > threshold (e.g., 0.5). advantage_normalization: Advantage normalization settings. amp_parameters: AMP-specific training parameters. reference_obs_components: Observation components for computing reference motion features. ase_parameters: ASE-specific training parameters.

ase_parameters: ASEParametersConfig#
__init__(
batch_size,
training_max_steps,
_target_='protomotions.agents.ase.agent.ASE',
model=<factory>,
num_steps=32,
gradient_clip_val=0.0,
fail_on_bad_grads=False,
check_grad_mag=True,
gamma=0.99,
bounds_loss_coef=0.0,
task_reward_w=1.0,
num_mini_epochs=1,
training_early_termination=None,
save_epoch_checkpoint_every=1000,
save_last_checkpoint_every=10,
max_episode_length_manager=None,
evaluator=<factory>,
normalize_rewards=True,
normalized_reward_clamp_value=5.0,
tau=0.95,
e_clip=0.2,
clip_critic_loss=True,
actor_clip_frac_threshold=0.6,
advantage_normalization=<factory>,
amp_parameters=<factory>,
reference_obs_components=<factory>,
ase_parameters=<factory>,
)#