Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

πŸ“… 2025-06-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional robot navigation systems exhibit poor adaptability and modular fragmentation in complex indoor-outdoor environments. To address this, we propose a dual-model collaborative architecture: Astra-Global, a multimodal large language model (MLLM), performs semantic understanding and constructs a global topological-semantic map; Astra-Local, a multi-task neural network, enables local path planning via a 4D spatiotemporal encoder and masked ESDF loss. Our approach innovatively fuses visual, linguistic, and multi-source sensor data, incorporates self-supervised odometry estimation, and employs flow-matching-based trajectory generation. The entire system is deployed end-to-end on a custom-built robotic platform. Experiments across diverse real-world indoor scenes demonstrate that our method achieves significantly higher task success rates compared to conventional vision-based localization and rule-driven navigation baselines.

Technology Category

Application Category

πŸ“ Abstract
Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal LLM, processes vision and language inputs to perform self and goal localization using a hybrid topological-semantic graph as the global map, and outperforms traditional visual place recognition methods. Astra-Local, a multitask network, handles local path planning and odometry estimation. Its 4D spatial-temporal encoder, trained through self-supervised learning, generates robust 4D features for downstream tasks. The planning head utilizes flow matching and a novel masked ESDF loss to minimize collision risks for generating local trajectories, and the odometry head integrates multi-sensor inputs via a transformer encoder to predict the relative pose of the robot. Deployed on real in-house mobile robots, Astra achieves high end-to-end mission success rate across diverse indoor environments.
Problem

Research questions and friction points this paper is trying to address.

Improving robot navigation in complex indoor environments
Overcoming adaptability limitations of traditional rule-based systems
Enhancing localization and path planning with multimodal learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM for global navigation
Self-supervised 4D encoder for local tasks
Flow matching for collision-free planning
πŸ”Ž Similar Papers
S
Sheng Chen
P
Peiyu He
J
Jiaxin Hu
Ziyang Liu
Ziyang Liu
Research Fellow, Harvard Medical School; PhD, Tsinghua University
AI4BioGraph EmbeddingLarge Language Model
Y
Yansheng Wang
T
Tao Xu
C
Chi Zhang
C
Chongchong Zhang
C
Chao An
S
Shiyu Cai
D
Duo Cao
K
Kangping Chen
S
Shuai Chu
T
Tianwei Chu
M
Mingdi Dan
Min Du
Min Du
NVIDIA
LLMRAGMachine learningSecurity
W
Weiwei Fang
P
Pengyou Fu
Junkai Hu
Junkai Hu
X
Xiaowei Jiang
Z
Zhaodi Jiang
F
Fuxuan Li
J
Jun Li
Minghui Li
Minghui Li
Huazhong University of Science and Technology
AI Security
Mingyao Li
Mingyao Li
Professor of Biostatistics and Digital Pathology, University of Pennsylvania School of Medicine
Single-cell & spatial omicsstatistical genomicscomputational pathologyAI/ML
Y
Yanchang Li
Zhibin Li
Zhibin Li
Professor in School of Transportation, Southeast University
Intelligent Transportation SystemTraffic ControlTraffic SafetyTraffic FlowData Mining
G
Guangming Liu
K
Kairui Liu
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
Weizhi Liu
Weizhi Liu
εŽδΈœεΈˆθŒƒε€§ε­¦
AIGC securityGenerative watermarking
X
Xiaoshun Liu
Y
Yufei Liu
Y
Yunfei Liu
Q
Qiang Lu
Y
Yuanfei Luo
X
Xiang Lv
H
Hongying Ma
Sai Ma
Sai Ma
Federal Reserve Board of Governors
Macro FinanceAsset Pricing
L
Lingxian Mi
S
Sha Sa
H
Hongxiang Shu
L
Lei Tian
C
Chengzhi Wang
Jiayu Wang
Jiayu Wang
Beihang University & Jiangnan University & The University of Auckland
Soft sensordata drivenfault detectionprocess monitoring
K
Kaijie Wang
Q
Qingyi Wang
R
R. Wang
T
Tao Wang
W
Wei Wang
X
Xirui Wang
Chao Wei
Chao Wei
Qualcomm, nokia, nokia siemens networks
Wireless communications
X
Xuguang Wei
Z
Zijun Xia
Z
Zhaohao Xiao
T
Tingshuai Yan
Liyan Yang
Liyan Yang
Y
Yifan Yang
Z
Zhikai Yang
Z
Zhong Yin
Li Yuan
Li Yuan
Research Associate, University of Science & Technology of China (USTC)
Antibiotic resistanceWastewater treatmentEnvironmental bioremediationAnaerobic digestionFate of organic pollutants
L
Liuchun Yuan
J
Jinyang Zhang
J
Junhui Zhang
L
Linge Zhang
Zhenyi Zhang
Zhenyi Zhang
Peking University
generative modelingcomputational systems biologystochastic models
D
Dongjie Zhu
H
Hang Li
Y
Yangang Zhang