Towards Learning a Generalizable 3D Scene Representation from 2D Observations

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of constructing a generalizable global 3D scene occupancy representation from egocentric 2D observations to enable manipulation tasks without per-scene fine-tuning. To this end, the authors propose a unified neural radiance field approach that end-to-end fuses multi-view geometric information in a global workspace coordinate system, directly predicting complete 3D occupancy from egocentric images. The method effectively handles occlusions and eliminates the need for scene-specific adaptation. Notably, it achieves the first demonstration of cross-scene generalization in global 3D occupancy modeling, attaining an average reconstruction error of 26 mm on full scenes—including occluded regions—after training on 40 real-world environments, significantly outperforming conventional stereo vision techniques.

Technology Category

Application Category

📝 Abstract

We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordinates, our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation. The model integrates flexible source views and generalizes to unseen object arrangements without scene-specific finetuning. We demonstrate the approach on a humanoid robot and evaluate predicted geometry against 3D sensor ground truth. Trained on 40 real scenes, our model achieves 26mm reconstruction error, including occluded regions, validating its ability to infer complete 3D occupancy beyond traditional stereo vision methods.

Problem

Research questions and friction points this paper is trying to address.

3D scene representation

2D observations

occupancy prediction

generalizable neural radiance field

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizable Neural Radiance Field

3D Scene Representation

Egocentric Observation