X-Mobility: End-To-End Generalizable Navigation via World Modeling

X-Mobility: End-To-End Generalizable Navigation via World Modeling

Wei Liu^1*, Huihua Zhao^1*, Chenran Li^1,2, Joydeep Biswas^1,3,
Billy Okal¹, Pulkit Goyal¹, Soha Pouya¹, Yan Chang¹

¹Nvidia, ²UC Berkeley, ³UT Austin

^*Equal contribution

ICRA 2025

Abstract

General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, a end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas.

First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies—off-policy data allows the model to learn world dynamics with maximum state-action pair coverage, while on-policy data with supervisory control enables optimal action policy learning.

Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses existing state-of-the-art navigation approaches in terms of success rate, navigation time, and motion smoothness. X-Mobility also achieves zero-shot Sim2Real transferability, with more efficient computational runtime compared to classical methods.

Method

Inspired by world modeling, X-Mobility employs a lightweight auto-regressive learning architecture to develop a rich latent representation space that probabilistically captures the world state and its dynamics. This representation space is effectively trained and strongly correlated with robust navigation skills through a set of diverse multi-task decoders. To address data sparsity, X-Mobility decouples world modeling from action policy imitation, allowing it to train from a variety of data sources, both with and without supervisory control inputs: off-policy data sources enable the model to learn world dynamics with unlimited state-action pairs that can guarantee maximum distribution coverage, while on-policy sources with supervisory control facilitate the action policy learning for goal reaching. With this architectural design, X-Mobility is trained through a multi-stage pipeline that leverages NVIDIA's Isaac Sim to generate large-scale, photorealistic synthetic datasets featuring diverse scenes and action policies.

Generalizable Navigation

Extensive experiments demonstrate that X-Mobility can consistently outperform current state-of-the-art methods, and exhibits strong generalization in out-of-distribution environments.

X-Mobility generalizing to various unseen environments without additional training.

X-Mobility acquired its foundational navigation skills from Nav2 and subsequently surpassed it, delivering enhanced performance, particularly in complex environments that Nav2 struggles to navigate effectively.

X-Mobility navigating around low-lying obstacles that traditional methods often miss.

X-Mobility efficiently navigating through densely packed obstacles.

Zero-shot Sim2Real Transfer

We successfully deployed X-Mobility on a Nova Carter robot without any fine-tuning, demonstrating its real-time navigation capabilities through zero-shot Sim2Real transfer.

Real-world deployment of X-Mobility on a Nova Carter robot navigating through a lab environment with obstacles.

BibTeX

@misc{liu2024xmobilityendtoendgeneralizablenavigation, title={X-MOBILITY: End-To-End Generalizable Navigation via World Modeling}, author={Wei Liu and Huihua Zhao and Chenran Li and Joydeep Biswas and Billy Okal and Pulkit Goyal and Yan Chang and Soha Pouya}, year={2024}, eprint={2410.17491}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2410.17491}, }