From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Real-world 3D scan data suffers from limited scale, heterogeneous annotation formats, and poor tool interoperability. To address these bottlenecks, this work proposes a unified annotation integration framework built upon Universal Scene Description (USD), defining a USD variant tailored for embodied intelligence tasks to enable structured fusion of multi-source annotations. We further incorporate large language models (LLMs) for scene semantic parsing and editable representation learning, establishing an end-to-end pipeline: “scan → semantic understanding → simulation-based policy learning.” Experiments demonstrate 80% success rate on LLM-driven scene editing tasks and 87% success rate on robot policy learning in simulation—significantly improving model generalization in photorealistic environments. This is the first work to deeply integrate USD into the 3D embodied intelligence data infrastructure, introducing a scalable, interoperable paradigm for scan-driven embodied reasoning and action.

Technology Category

Application Category

📝 Abstract

Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with application-specific USD flavors. We identify challenges in utilizing holistic real-world scan datasets and present mitigation strategies. The efficacy of our approach is demonstrated through two downstream applications: LLM-based scene editing, enabling effective LLM understanding and adaptation of the data (80% success), and robotic simulation, achieving an 87% success rate in policy learning.

Problem

Research questions and friction points this paper is trying to address.

Unifying diverse scan annotations using USD format

Overcoming challenges in utilizing real-world 3D scans

Enhancing embodied AI applications with realistic scans

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified annotation integration using USD

Application-specific USD flavors for scans

Mitigation strategies for scan dataset challenges

🔎 Similar Papers

Exosense: A Vision-Based Scene Understanding System For Exoskeletons