General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, a end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas.
First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies—off-policy data allows the model to learn world dynamics with maximum state-action pair coverage, while on-policy data with supervisory control enables optimal action policy learning.
Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses existing state-of-the-art navigation approaches in terms of success rate, navigation time, and motion smoothness. X-Mobility also achieves zero-shot Sim2Real transferability, with more efficient computational runtime compared to classical methods.
Inspired by world modeling, X-Mobility employs a lightweight auto-regressive learning architecture to develop a rich latent representation space that probabilistically captures the world state and its dynamics. This representation space is effectively trained and strongly correlated with robust navigation skills through a set of diverse multi-task decoders. To address data sparsity, X-Mobility decouples world modeling from action policy imitation, allowing it to train from a variety of data sources, both with and without supervisory control inputs: off-policy data sources enable the model to learn world dynamics with unlimited state-action pairs that can guarantee maximum distribution coverage, while on-policy sources with supervisory control facilitate the action policy learning for goal reaching. With this architectural design, X-Mobility is trained through a multi-stage pipeline that leverages NVIDIA's Isaac Sim to generate large-scale, photorealistic synthetic datasets featuring diverse scenes and action policies.
Extensive experiments demonstrate that X-Mobility can consistently outperform current state-of-the-art methods, and exhibits strong generalization in out-of-distribution environments.
X-Mobility generalizing to various unseen environments without additional training.
X-Mobility acquired its foundational navigation skills from Nav2 and subsequently surpassed it, delivering enhanced performance, particularly in complex environments that Nav2 struggles to navigate effectively.
X-Mobility navigating around low-lying obstacles that traditional methods often miss.
X-Mobility efficiently navigating through densely packed obstacles.
We successfully deployed X-Mobility on a Nova Carter robot without any fine-tuning, demonstrating its real-time navigation capabilities through zero-shot Sim2Real transfer.
Real-world deployment of X-Mobility on a Nova Carter robot navigating through a lab environment with obstacles.
@misc{liu2024xmobilityendtoendgeneralizablenavigation,
title={X-MOBILITY: End-To-End Generalizable Navigation via World Modeling},
author={Wei Liu and Huihua Zhao and Chenran Li and Joydeep Biswas and Billy Okal and Pulkit Goyal and Yan Chang and Soha Pouya},
year={2024},
eprint={2410.17491},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.17491},
}