Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

๐Ÿ“… 2025-08-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of bridging large language models (LLMs) with the physical world to enable robots to understand and autonomously execute natural-language instructions. To this end, we propose a novel paradigm integrating implicit neural scene modeling with 3D language grounding: (1) compact, geometrically consistent implicit scene representations are constructed via self-supervised camera calibration, high-fidelity depth field estimation, and large-scale scene reconstruction; (2) an LLM inference mechanism enhanced with spatial awareness is designed, coupled with a navigation-oriented 3D languageโ€“action alignment benchmark and a closed-loop state feedback framework. Experiments demonstrate substantial improvements in spatial comprehension accuracy and cross-scale navigation robustness under complex, long-horizon instructions. Our approach provides a scalable technical pathway toward embodied intelligence for generative AI.

Technology Category

Application Category

๐Ÿ“ Abstract
This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs) and physical embodiment, we present contributions on two fronts: scene representation and spatial reasoning. For perception, we develop robust, scalable, and accurate scene representations using implicit neural models, with contributions in self-supervised camera calibration, high-fidelity depth field generation, and large-scale reconstruction. For spatial reasoning, we enhance the spatial capabilities of LLMs by introducing a novel navigation benchmark, a method for grounding language in 3D, and a state-feedback mechanism to improve long-horizon decision-making. This work lays a foundation for robots that can robustly perceive their surroundings and intelligently act upon complex, language-based commands.
Problem

Research questions and friction points this paper is trying to address.

Developing robots that perceive and act from natural language instructions
Bridging Large Language Models with physical embodiment challenges
Enhancing spatial reasoning and scene representation for robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit neural models for scene representation
Enhanced spatial reasoning in LLMs
State-feedback mechanism for decision-making
๐Ÿ”Ž Similar Papers
No similar papers found.