Humanoid interaction in a shared workspace.
Humanoid interaction in a warehouse.
As robots are increasingly deployed in diverse application domains, generalizable cross-embodiment mobility policies are increasingly essential. While classical mobility stacks have proven effective on specific robot platforms, they pose significant challenges when scaling to new embodiments. Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), offer alternative solutions but suffer from covariate shift, sparse sampling in large environments, and embodiment-specific constraints.
This work introduces COMPASS, a novel workflow for developing cross-embodiment mobility policies by integrating IL, residual RL, and policy distillation. We begin with IL on a mobile robot, leveraging easily accessible teacher policies to train a foundational model that combines a world model with a mobility policy. Building on this base, we employ residual RL to fine-tune embodiment-specific policies, exploiting pre-trained representations to improve sampling efficiency in handling various physical constraints and sensor modalities. Finally, policy distillation merges these embodiment-specialist policies into a single robust cross-embodiment policy.
We empirically demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy. The resulting framework offers an efficient, scalable solution for cross-embodiment mobility, enabling robots with different designs to navigate safely and efficiently in complex scenarios.
COMPASS introduces a novel three-stage learning workflow for developing cross-embodiment mobility policies:
This approach enables efficient transfer of mobility skills across diverse robot platforms while maintaining adaptability to various environment configurations.
Extensive experiments demonstrate that COMPASS can achieves robust generalization across diverse robot platforms while preserving the adaptability needed to succeed in varied environments. Quantitatively, the RL specialists and the distilled generalist policy can achieve a 5X higher success rate and 3X lower travel time on average than the pre-trained IL policy (X-Mobility).
H1 (Humanoid)
Spot Mini (Quadruped)
Carter (Wheeled)
G1 (Humanoid)
While not explicitly trained to handle dynamic obstacles, COMPASS demonstrates robust zero-shot capabilities, efficiently replanning and showcasing a solid understanding of the environments.
Humanoid interaction in a shared workspace.
Humanoid interaction in a warehouse.
@article{liu2025compass,
title={COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis},
author={Liu, Wei and Zhao, Huihua and Li, Chenran and Biswas, Joydeep and Pouya, Soha and Chang, Yan},
journal={arXiv preprint arXiv:2502.16372},
year={2025}
}