Training residual RL specialists#

run.py trains a residual RL policy on top of the X-Mobility base policy for a given embodiment + scene. The residual head adapts X-Mobility’s language-conditioned navigation behaviour to embodiment-specific dynamics.

Default training run#

python run.py \
    -c configs/train_config.gin \
    -o <output_dir> \
    -b <path/to/x_mobility_ckpt> \
    --enable_cameras

(Inside an activated dev shell, python already points at Isaac Sim’s bundled Python; on a bare-metal install, prefix with ${ISAACLAB_PATH}/isaaclab.sh -p.)

Evaluating a trained specialist#

python run.py \
    -c configs/eval_config.gin \
    -o <output_dir> \
    -b <path/to/x_mobility_ckpt> \
    -p <path/to/residual_policy_ckpt> \
    --enable_cameras \
    --video \
    --video_interval <video_interval>

GPU memory and `--num_envs`#

GPU memory scales linearly with the parallel-env count: 32 envs ≈ 30 GB. Drop --num_envs to fit the host. --num_envs 1 is the canonical smoke-test setting and reaches PPO iteration 0 in a few minutes.

Picking embodiment / scene#

Override the gin defaults via CLI:

python run.py -c configs/train_config.gin \
    --embodiment {h1,carter,spot,g1,digit} \
    --environment <scene_name> \
    -o <output_dir> -b <ckpt>

The current set lives in EmbodimentEnvCfgMap and EnvSceneAssetCfgMap in run.py. To register a new one, see Adding a new embodiment / scene.

Logging#

TensorBoard at <output_dir>/tensorboard/ by default. For Weights & Biases add:

--logger wandb \
--wandb-project-name <name> \
--wandb-run-name <name> \
--wandb-entity-name <entity>

Multi-GPU training#

Pass --distributed and launch under torch.distributed.run to fan out across GPUs. Each rank runs its own Isaac Sim instance + env; gradients, KL, and metrics sync via manual all-reduce in PPO.update. Rank 0 owns the logger, checkpoints, video, and episode-log writes.

${ISAACLAB_PATH}/isaaclab.sh -p -m torch.distributed.run \
    --nproc_per_node=8 \
    run.py --distributed \
    -c configs/train_config.gin \
    -o <output_dir> -b <x_mobility_ckpt> \
    --enable_cameras --num_envs 32

Total parallel envs = nproc_per_node × num_envs. The trainer’s distributed code paths are world_size-aware, so --nproc_per_node=1 (or just plain python run.py --distributed) is also a valid single-rank fallback. On OSMO, osmo/run_osmo.py train --num-gpus {2,8} routes to the matching distributed-workflow YAML.

Submitting to OSMO#

Train on the OSMO cluster instead of locally. The X-Mobility base ckpt and COMPASS USDs are downloaded inside the workflow from HuggingFace, so you don’t need to pass the base-policy checkpoint or USDs locally.

osmo/run_osmo.py is host-side (it shells out to docker and osmo CLIs). The activate shim auto-routes it to host Python via the # COMPASS_HOST_SIDE marker, so plain python osmo/run_osmo.py … works from the activated shell — see the host-side callout in OSMO cloud submission for the full explanation and fallbacks.

export COMPASS_OSMO_REGISTRY=nvcr.io/<org>/<team>
export WANDB_API_KEY=...
export HF_TOKEN=...

python osmo/run_osmo.py train \
    --experiment-name <name> \
    --wandb-project <wandb-project> \
    [--num-gpus 8] \
    [--image <pre-built>]

Full reference: OSMO cloud submission.