COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis

Abstract

As robots are increasingly deployed in diverse application domains, generalizable cross-embodiment mobility policies are increasingly essential. While classical mobility stacks have proven effective on specific robot platforms, they pose significant challenges when scaling to new embodiments. Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), offer alternative solutions but suffer from covariate shift, sparse sampling in large environments, and embodiment-specific constraints.

This work introduces COMPASS, a novel workflow for developing cross-embodiment mobility policies by integrating IL, residual RL, and policy distillation. We begin with IL on a mobile robot, leveraging easily accessible teacher policies to train a foundational model that combines a world model with a mobility policy. Building on this base, we employ residual RL to fine-tune embodiment-specific policies, exploiting pre-trained representations to improve sampling efficiency in handling various physical constraints and sensor modalities. Finally, policy distillation merges these embodiment-specialist policies into a single robust cross-embodiment policy.

We empirically demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy. The resulting framework offers an efficient, scalable solution for cross-embodiment mobility, enabling robots with different designs to navigate safely and efficiently in complex scenarios.

Method

COMPASS introduces a novel three-stage learning workflow for developing cross-embodiment mobility policies:

Imitation Learning: We first train a foundational model that combines a world model with a mobility policy using easily accessible teacher policies.
Residual RL: Building on the base policy, we employ residual reinforcement learning to fine-tune embodiment-specific policies, handling various physical constraints and sensor modalities.
Policy Distillation: Finally, we merge these embodiment-specialist policies into a single robust cross-embodiment policy through policy distillation.

This approach enables efficient transfer of mobility skills across diverse robot platforms while maintaining adaptability to various environment configurations.

Results

Extensive experiments demonstrate that COMPASS can achieves robust generalization across diverse robot platforms while preserving the adaptability needed to succeed in varied environments. Quantitatively, the RL specialists and the distilled generalist policy can achieve a 5X higher success rate and 3X lower travel time on average than the pre-trained IL policy (X-Mobility).

H1 (Humanoid)

Spot Mini (Quadruped)

Carter (Wheeled)

G1 (Humanoid)

Zero-shot Sim2Real Transfer

COMPASS policies trained in simulation can be directly deployed on real robots, demonstrating strong sim2real transfer capabilities without additional fine-tuning.

Real-world deployment of COMPASS policy on real robots: Carter and G1.

Zero-shot Multi-robot Interaction

While not explicitly trained to handle dynamic obstacles, COMPASS demonstrates robust zero-shot capabilities, efficiently replanning and showcasing a solid understanding of the environments.

Humanoid interaction in a shared workspace.

Humanoid interaction in a warehouse.

BibTeX


    @article{liu2025compass,
      title={COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis},
      author={Liu, Wei and Zhao, Huihua and Li, Chenran and Biswas, Joydeep and Pouya, Soha and Chang, Yan},
      journal={arXiv preprint arXiv:2502.16372},
      year={2025}
    }