Configuration Guide

Configuration Guide#

SONIC uses Hydra for hierarchical configuration. This guide explains the config structure and the most important parameters to tune.

Config Hierarchy#

When you run a training command like:

python gear_sonic/train_agent_trl.py +exp=manager/universal_token/all_modes/sonic_release

Hydra composes the final config from a chain of YAML files:

gear_sonic/config/
├── base.yaml                    # Global defaults (seed, num_envs, paths)
├── base/
│   ├── hydra.yaml               # Hydra output directory settings
│   └── structure.yaml           # Resolved experiment directory structure
├── algo/
│   └── ppo_im_phc.yaml          # PPO hyperparameters
├── manager_env/
│   ├── base_env.yaml            # Environment defaults (sim_dt, decimation, episode length)
│   ├── actions/tracking/base.yaml
│   ├── commands/tracking/base.yaml
│   │   └── terms/motion.yaml    # Motion library, body names, future frames
│   ├── rewards/tracking/
│   │   └── base_5point_local_feet_acc.yaml  # Reward composition
│   │       └── terms/*.yaml     # Individual reward terms with weights
│   ├── terminations/tracking/
│   │   └── base_adaptive_strict_ori_foot_xyz.yaml  # Termination composition
│   │       └── terms/*.yaml     # Individual termination conditions
│   ├── events/tracking/
│   │   └── level0_4.yaml        # Domain randomization events
│   └── observations/
│       ├── tokenizer/           # Encoder input observations
│       ├── policy/              # Policy (actor) observations
│       └── critic/              # Critic observations
├── actor_critic/
│   └── universal_token/         # Network architecture (encoders, decoders, quantizer)
├── aux_losses/
│   └── universal_token/         # Auxiliary loss terms
├── trainer/
│   └── trl_ppo_aux.yaml         # Trainer config (PPO with aux losses)
├── callbacks/                   # Training callbacks (save, eval, W&B, resample)
└── exp/manager/universal_token/all_modes/
    └── sonic_release.yaml       # Experiment config (overrides all of the above)

The experiment config (sonic_release.yaml) sits at the top and overrides specific values from the base configs. You can further override any value from the command line with ++key=value.

Overriding Config Values#

Hydra uses ++ prefix to force-override values (even nested ones):

# Override a top-level value
python gear_sonic/train_agent_trl.py +exp=... num_envs=16

# Override a nested value (use dots for nesting)
python gear_sonic/train_agent_trl.py +exp=... \
    ++manager_env.commands.motion.motion_lib_cfg.motion_file=/path/to/data

# Override a reward weight
python gear_sonic/train_agent_trl.py +exp=... \
    ++manager_env.rewards.tracking_anchor_pos.weight=1.0

Top Parameters to Tune#

Training scale#

Parameter	Default	Location	Description
`num_envs`	4096	`base.yaml`	Number of parallel environments. Reduce for debugging (`16`), increase for throughput.
`headless`	True	`base.yaml`	Set `False` to open the Isaac Lab viewer for visual debugging.
`seed`	0	`base.yaml`	Random seed for reproducibility.

PPO hyperparameters#

Parameter	Default	Location	Description
`algo.config.actor_learning_rate`	2e-5	`ppo_im_phc.yaml`	Actor learning rate. Lower for finetuning, higher for training from scratch.
`algo.config.critic_learning_rate`	1e-3	`ppo_im_phc.yaml`	Critic learning rate. Usually 10-100x the actor LR.
`algo.config.num_learning_epochs`	5	`ppo_im_phc.yaml`	PPO epochs per batch of experience.
`algo.config.num_mini_batches`	4	`ppo_im_phc.yaml`	Mini-batches per PPO epoch.
`algo.config.num_steps_per_env`	24	`sonic_release.yaml`	Rollout length (steps per env before PPO update).
`algo.config.gamma`	0.99	`ppo_im_phc.yaml`	Discount factor.
`algo.config.lam`	0.95	`ppo_im_phc.yaml`	GAE lambda.
`algo.config.clip_param`	0.2	`ppo_im_phc.yaml`	PPO clip parameter.
`algo.config.entropy_coef`	0.01	`ppo_im_phc.yaml`	Entropy bonus coefficient.
`algo.config.desired_kl`	0.01	`ppo_im_phc.yaml`	Target KL for adaptive learning rate schedule.
`algo.config.num_learning_iterations`	100000	`ppo_im_phc.yaml`	Total training iterations.

Simulation#

Parameter	Default	Location	Description
`manager_env.config.sim_dt`	0.005	`base_env.yaml`	Physics timestep (200 Hz). Smaller = more stable but slower.
`manager_env.config.decimation`	4	`base_env.yaml`	Policy runs every `decimation` sim steps (50 Hz policy at 200 Hz sim).
`manager_env.config.episode_length_s`	10.0	`base_env.yaml`	Episode length in seconds before timeout reset.
`manager_env.config.terrain_type`	trimesh	`sonic_release.yaml`	`plane` for flat ground, `trimesh` for rough terrain.
`manager_env.config.robot.type`	g1_model_12_dex	`sonic_release.yaml`	Robot type (must match `robot_mapping` in code).

Motion data#

Parameter	Default	Location	Description
`manager_env.commands.motion.motion_lib_cfg.motion_file`	—	`sonic_release.yaml`	Path to retargeted robot motion PKLs.
`manager_env.commands.motion.motion_lib_cfg.smpl_motion_file`	—	`sonic_release.yaml`	Path to SMPL motion PKLs (or `dummy`).
`manager_env.commands.motion.motion_lib_cfg.soma_motion_file`	—	`sonic_bones_seed.yaml`	Path to SOMA motion PKLs (4-encoder config only).
`manager_env.commands.motion.motion_lib_cfg.smpl_y_up`	true	`sonic_release.yaml`	Set `true` if SMPL data uses y-up coordinates.
`manager_env.commands.motion.motion_lib_cfg.target_fps`	50	`motion.yaml`	Target FPS for motion resampling.
`manager_env.commands.motion.motion_lib_cfg.asset.assetFileName`	g1_29dof_rev_1_0.xml	`motion.yaml`	MJCF file for motion library FK. Change for different robots.

Motion command#

Parameter	Default	Location	Description
`manager_env.commands.motion.num_future_frames`	10	`sonic_release.yaml`	Number of future reference frames provided to the policy.
`manager_env.commands.motion.dt_future_ref_frames`	0.1	`sonic_release.yaml`	Time spacing between future frames (seconds).
`manager_env.commands.motion.cat_upper_body_poses`	true	`sonic_release.yaml`	Augment lower-body motions with upper-body from different clips.
`manager_env.commands.motion.cat_upper_body_poses_prob`	0.5	`sonic_release.yaml`	Probability of upper-body augmentation per episode.
`manager_env.commands.motion.freeze_frame_aug`	true	`sonic_release.yaml`	Augment with frozen (static) reference frames.

Observation history#

Parameter	Default	Location	Description
`actor_prop_history_length`	10	`sonic_release.yaml`	Number of past proprioception frames stacked for actor.
`actor_actions_history_length`	10	`sonic_release.yaml`	Number of past actions stacked for actor.
`critic_prop_history_length`	10	`sonic_release.yaml`	Same, for critic.
`critic_actions_history_length`	10	`sonic_release.yaml`	Same, for critic.

Reward weights#

All reward terms have a weight parameter. Positive weights encourage the behavior, negative weights penalize it. The default weights for base_5point_local_feet_acc:

Reward term	Weight	Description
`tracking_anchor_pos`	0.5	Root position tracking
`tracking_anchor_ori`	0.5	Root orientation tracking
`tracking_relative_body_pos`	1.0	Body position tracking (anchor-relative)
`tracking_relative_body_ori`	1.0	Body orientation tracking (anchor-relative)
`tracking_body_linvel`	1.0	Body linear velocity tracking
`tracking_body_angvel`	1.0	Body angular velocity tracking
`tracking_vr_5point_local`	2.0	5-point (wrists + head + feet) local tracking
`action_rate_l2`	-0.1	Smooth actions (penalize jerk)
`joint_limit`	-10.0	Stay within joint limits
`undesired_contacts`	-0.1	Penalize non-foot ground contacts
`anti_shake_ang_vel`	-0.005	Penalize wrist/head jitter
`feet_acc`	-2.5e-6	Penalize foot acceleration (smooth stepping)

Each reward term also has a std parameter controlling the Gaussian kernel sharpness. Smaller std = stricter tracking (reward drops faster with error).

Override example:

++manager_env.rewards.tracking_anchor_pos.weight=2.0
++manager_env.rewards.tracking_anchor_pos.params.std=0.1

Termination thresholds#

Terminations end episodes early when tracking error exceeds a threshold. The adaptive variants use a curriculum that tightens thresholds over training:

Termination	Threshold	Description
`anchor_pos`	0.15 m	Root position deviation
`anchor_ori_full`	0.2 rad	Root orientation deviation
`ee_body_pos`	0.15 m	End-effector position deviation
`foot_pos_xyz`	0.2 m	Foot position deviation
`motion_time_out`	—	Episode ends when motion clip finishes

Looser thresholds (larger values) make training easier initially. The adaptive terminations automatically tighten as the policy improves.

Adaptive motion sampling#

The motion library supports adaptive sampling — motions the policy fails on are sampled more frequently:

Parameter	Default	Description
`adaptive_sampling.enable`	true	Enable adaptive sampling.
`adaptive_sampling.bin_size`	50	Window size for failure rate tracking.
`adaptive_sampling.adp_samp_failure_rate_max_over_mean`	200	Max/mean failure rate ratio cap. Prevents one hard motion from dominating.

Saving and logging#

Parameter	Default	Location	Description
`algo.config.save_interval`	500	`ppo_im_phc.yaml`	Save checkpoint every N iterations.
`algo.config.eval_frequency`	500	`ppo_im_phc.yaml`	Run evaluation every N iterations.
`use_wandb`	false	`base.yaml`	Enable Weights & Biases logging.
`base_dir`	logs_rl	`base.yaml`	Root directory for training outputs.

Experiment Configs#

Config	Encoders	Use case
`sonic_release`	G1, teleop, SMPL	Default — matches the released checkpoint
`sonic_bones_seed`	G1, teleop, SMPL, SOMA	Extended training with SOMA skeleton encoder
`sonic_h2`	G1, teleop, SMPL	H2 robot (31 DOF)

Common Recipes#

Debug a training run visually#

python gear_sonic/train_agent_trl.py +exp=... \
    num_envs=4 headless=False \
    algo.config.num_learning_iterations=10

Finetune with lower learning rate#

python gear_sonic/train_agent_trl.py +exp=... \
    +checkpoint=sonic_release/last.pt \
    ++algo.config.actor_learning_rate=5e-6 \
    ++algo.config.desired_kl=0.005

Train on flat ground only#

python gear_sonic/train_agent_trl.py +exp=... \
    ++manager_env.config.terrain_type=plane

Relax termination thresholds for hard motions#

python gear_sonic/train_agent_trl.py +exp=... \
    ++manager_env.terminations.anchor_pos.params.threshold=0.3 \
    ++manager_env.terminations.ee_body_pos.params.threshold=0.3

Increase tracking precision#

python gear_sonic/train_agent_trl.py +exp=... \
    ++manager_env.rewards.tracking_relative_body_pos.params.std=0.1 \
    ++manager_env.rewards.tracking_anchor_pos.params.std=0.1