Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge of bridging large language models (LLMs) with the physical world to enable robots to understand and autonomously execute natural-language instructions. To this end, we propose a novel paradigm integrating implicit neural scene modeling with 3D language grounding: (1) compact, geometrically consistent implicit scene representations are constructed via self-supervised camera calibration, high-fidelity depth field estimation, and large-scale scene reconstruction; (2) an LLM inference mechanism enhanced with spatial awareness is designed, coupled with a navigation-oriented 3D language–action alignment benchmark and a closed-loop state feedback framework. Experiments demonstrate substantial improvements in spatial comprehension accuracy and cross-scale navigation robustness under complex, long-horizon instructions. Our approach provides a scalable technical pathway toward embodied intelligence for generative AI.

Technology Category

Application Category

📝 Abstract

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs) and physical embodiment, we present contributions on two fronts: scene representation and spatial reasoning. For perception, we develop robust, scalable, and accurate scene representations using implicit neural models, with contributions in self-supervised camera calibration, high-fidelity depth field generation, and large-scale reconstruction. For spatial reasoning, we enhance the spatial capabilities of LLMs by introducing a novel navigation benchmark, a method for grounding language in 3D, and a state-feedback mechanism to improve long-horizon decision-making. This work lays a foundation for robots that can robustly perceive their surroundings and intelligently act upon complex, language-based commands.

Problem

Research questions and friction points this paper is trying to address.

Developing robots that perceive and act from natural language instructions

Bridging Large Language Models with physical embodiment challenges

Enhancing spatial reasoning and scene representation for robotics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit neural models for scene representation

Enhanced spatial reasoning in LLMs

State-feedback mechanism for decision-making

🔎 Similar Papers

No similar papers found.