What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper presents a systematic survey of 3D scene representation methods for robotic tasks, addressing five core capabilities: perception, mapping, localization, navigation, and manipulation. It comparatively analyzes geometric and neural paradigms—including point clouds, voxels, signed distance fields (SDFs), neural radiance fields (NeRFs), and 3D Gaussian splatting—highlighting trade-offs in accuracy, computational efficiency, generalization, and semantic interpretability. Methodologically, it proposes a unified architectural pathway centered on 3D foundation models, integrating multimodal priors—particularly language and semantic knowledge—to enable high-level, embodied scene understanding. The work contributes an open-source evaluation benchmark and a modular implementation framework, offering the first structured taxonomy and evolutionary analysis spanning the full technical spectrum. This serves as both a theoretical reference and a practical development guide for advancing robotic 3D scene understanding.

Technology Category

Application Category

📝 Abstract
In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance functions (SDF), and scene graphs, as well as more recent neural representations like Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and the emerging Foundation Models. While current SLAM and localization systems predominantly rely on sparse representations like point clouds and voxels, dense scene representations are expected to play a critical role in downstream tasks such as navigation and obstacle avoidance. Moreover, neural representations such as NeRF, 3DGS, and foundation models are well-suited for integrating high-level semantic features and language-based priors, enabling more comprehensive 3D scene understanding and embodied intelligence. In this paper, we categorized the core modules of robotics into five parts (Perception, Mapping, Localization, Navigation, Manipulation). We start by presenting the standard formulation of different scene representation methods and comparing the advantages and disadvantages of scene representation across different modules. This survey is centered around the question: What is the best 3D scene representation for robotics? We then discuss the future development trends of 3D scene representations, with a particular focus on how the 3D Foundation Model could replace current methods as the unified solution for future robotic applications. The remaining challenges in fully realizing this model are also explored. We aim to offer a valuable resource for both newcomers and experienced researchers to explore the future of 3D scene representations and their application in robotics. We have published an open-source project on GitHub and will continue to add new works and technologies to this project.
Problem

Research questions and friction points this paper is trying to address.

Evaluating 3D scene representations for robotics across perception and navigation tasks
Comparing geometric and neural representations like NeRF and foundation models
Exploring foundation models as a unified solution for future robotic applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive overview of geometric and neural scene representations
Focus on dense representations for navigation and obstacle avoidance
Advocates 3D Foundation Models as unified future solution
🔎 Similar Papers
No similar papers found.
Tianchen Deng
Tianchen Deng
Shanghai Jiao Tong University
RoboticsComputer Vision
Y
Yue Pan
University of Bonn, Germany
S
Shenghai Yuan
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
D
Dong Li
Institute of Automation, Chinese Academy of Sciences
C
Chen Wang
University at Buffalo, Buffalo, NY 14260, USA
Mingrui Li
Mingrui Li
Dalian University of Technology
SLAM3D VisionRobotics
L
Long Chen
Institute of Automation, Chinese Academy of Sciences
Lihua Xie
Lihua Xie
Professor of Electrical Engineering, Nanyang Technological University
Robust controlNetworked ControlMult-agent Systems
Danwei Wang
Danwei Wang
Professor, Nanyang Technological University
RoboticsControl EngineeringFault Diagnosis
J
Jingchuan Wang
Institute of Medical Robotics and Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
Javier Civera
Javier Civera
I3A, Universidad de Zaragoza, Spain
Computer VisionRoboticsSLAMVisual SLAM
H
Hesheng Wang
Institute of Medical Robotics and Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
W
Weidong Chen
Institute of Medical Robotics and Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China