COMPASS: Cross-embOdiment Mobility Policy
via ResiduAl RL and Skill Synthesis

Wei Liu1, Huihua Zhao1, Chenran Li1,2, Yuchen Deng1, Joydeep Biswas1,3, Soha Pouya1, Yan Chang1
1Nvidia, 2UC Berkeley 3UT Austin

Abstract

As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment.

To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy, and further demonstrates zero-shot sim-to-real transfer.

Method

Method overview

COMPASS introduces a novel three-stage learning workflow for developing cross-embodiment mobility policies:

  1. Imitation Learning: We first train a foundational model that combines a world model with a mobility policy using easily accessible teacher policies.
  2. Residual RL: Building on the base policy, we employ residual reinforcement learning to fine-tune embodiment-specific policies, handling various physical constraints and sensor modalities.
  3. Policy Distillation: Finally, we merge these embodiment-specialist policies into a single robust cross-embodiment policy through policy distillation.

This approach enables efficient transfer of mobility skills across diverse robot platforms while maintaining adaptability to various environment configurations.

Results

Extensive experiments demonstrate that COMPASS can achieves robust generalization across diverse robot platforms while preserving the adaptability needed to succeed in varied environments. Quantitatively, the RL specialists and the distilled generalist policy can achieve a 5X higher success rate and 3X lower travel time on average than the pre-trained IL policy (X-Mobility).

Benchmark Results

Zero-shot Sim2Real Transfer

COMPASS policies trained in simulation can be directly deployed on real robots, demonstrating strong sim2real transfer capabilities without additional fine-tuning.

Real-world deployment of COMPASS policy on real robots: Carter and G1.

Open Vocabulary Object Navigation

Open Vocabulary Object Navigation by integrating Locate3D with COMPASS.

COMPASS policy performing open vocabulary object navigation by integrating Locate3D.

GROOT Post-training with COMPASS Datasets

GROOT Post-training with COMPASS distillation datasets, enabling navigation capabilities.

GROOT post-training with COMPASS datasets for navigation.

BibTeX


    @article{liu2025compass,
      title={COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis},
      author={Liu, Wei and Zhao, Huihua and Li, Chenran and Deng, Yuchen and Biswas, Joydeep and Pouya, Soha and Chang, Yan},
      journal={arXiv preprint arXiv:2502.16372},
      year={2025}
    }