Towards Learning a Generalizable 3D Scene Representation from 2D Observations

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of constructing a generalizable global 3D scene occupancy representation from egocentric 2D observations to enable manipulation tasks without per-scene fine-tuning. To this end, the authors propose a unified neural radiance field approach that end-to-end fuses multi-view geometric information in a global workspace coordinate system, directly predicting complete 3D occupancy from egocentric images. The method effectively handles occlusions and eliminates the need for scene-specific adaptation. Notably, it achieves the first demonstration of cross-scene generalization in global 3D occupancy modeling, attaining an average reconstruction error of 26 mm on full scenes—including occluded regions—after training on 40 real-world environments, significantly outperforming conventional stereo vision techniques.

Technology Category

Application Category

📝 Abstract
We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordinates, our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation. The model integrates flexible source views and generalizes to unseen object arrangements without scene-specific finetuning. We demonstrate the approach on a humanoid robot and evaluate predicted geometry against 3D sensor ground truth. Trained on 40 real scenes, our model achieves 26mm reconstruction error, including occluded regions, validating its ability to infer complete 3D occupancy beyond traditional stereo vision methods.
Problem

Research questions and friction points this paper is trying to address.

3D scene representation
2D observations
occupancy prediction
generalizable neural radiance field
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizable Neural Radiance Field
3D Scene Representation
Egocentric Observation
Global Workspace Frame
Robotic Manipulation
M
Martin Gromniak
University of Hamburg - Department of Informatics, Hamburg - Germany; ZAL Center of Applied Aeronautical Research, Hamburg - Germany
Jan-Gerrit Habekost
Jan-Gerrit Habekost
University of Hamburg
Neurorobotics
S
Sebastian Kamp
University of Hamburg - Department of Informatics, Hamburg - Germany
Sven Magg
Sven Magg
Senior researcher, Hamburger Informatik Technologie-Center (HITeC) e.V., Hamburg, Germany
NeuroroboticsNeural NetworksEvolutionary ComputingSwarm Intelligence
S
Stefan Wermter
University of Hamburg - Department of Informatics, Hamburg - Germany