X-Mobility: End-To-End Generalizable Navigation via World Modeling

Wei Liu1*, Huihua Zhao1*, Chenran Li1,2, Joydeep Biswas1,3,
Billy Okal1, Pulkit Goyal1, Soha Pouya1, Yan Chang1
1Nvidia, 2UC Berkeley, 3UT Austin
*Equal contribution

Abstract

General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, a end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas.

First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies—off-policy data allows the model to learn world dynamics with maximum state-action pair coverage, while on-policy data with supervisory control enables optimal action policy learning.

Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses existing state-of-the-art navigation approaches in terms of success rate, navigation time, and motion smoothness. X-Mobility also achieves zero-shot Sim2Real transferability, with more efficient computational runtime compared to classical methods.

Method

Method overview

Inspired by world modeling, X-Mobility employs a lightweight auto-regressive learning architecture to develop a rich latent representation space that probabilistically captures the world state and its dynamics. This representation space is effectively trained and strongly correlated with robust navigation skills through a set of diverse multi-task decoders. To address data sparsity, X-Mobility decouples world modeling from action policy imitation, allowing it to train from a variety of data sources, both with and without supervisory control inputs: off-policy data sources enable the model to learn world dynamics with unlimited state-action pairs that can guarantee maximum distribution coverage, while on-policy sources with supervisory control facilitate the action policy learning for goal reaching. With this architectural design, X-Mobility is trained through a multi-stage pipeline that leverages NVIDIA's Isaac Sim to generate large-scale, photorealistic synthetic datasets featuring diverse scenes and action policies.

Generalizable Navigation

Extensive experiments demonstrate that X-Mobility can consistently outperform current state-of-the-art methods, and exhibits strong generalization in out-of-distribution environments.

X-Mobility generalizing to various unseen environments without additional training.

X-Mobility acquired its foundational navigation skills from Nav2 and subsequently surpassed it, delivering enhanced performance, particularly in complex environments that Nav2 struggles to navigate effectively.

X-Mobility navigating around low-lying obstacles that traditional methods often miss.

X-Mobility efficiently navigating through densely packed obstacles.

Zero-shot Sim2Real Transfer

We successfully deployed X-Mobility on a Nova Carter robot without any fine-tuning, demonstrating its real-time navigation capabilities through zero-shot Sim2Real transfer.

Real-world deployment of X-Mobility on a Nova Carter robot navigating through a lab environment with obstacles.

BibTeX

@misc{liu2024xmobilityendtoendgeneralizablenavigation,
      title={X-MOBILITY: End-To-End Generalizable Navigation via World Modeling}, 
      author={Wei Liu and Huihua Zhao and Chenran Li and Joydeep Biswas and Billy Okal and Pulkit Goyal and Yan Chang and Soha Pouya},
      year={2024},
      eprint={2410.17491},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2410.17491},
}