I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing interactive 3D scene generation methods suffer from poor generalization—constrained by limited annotated data, they fail to adapt to unseen layouts or novel object compositions. Method: We propose a scene-level supervision-free reprogramming paradigm that transfers the implicit spatial priors of a pre-trained 3D instance generator into a scene-level spatial learner. Contributions/Results: First, we reveal that pre-trained instance generators intrinsically encode exploitable spatial reasoning capabilities. Second, we introduce a viewpoint-centric, purely geometric scene representation—bypassing canonical-space modeling and explicit relational annotations. Third, we jointly optimize implicit neural fields to enable end-to-end learning of spatial relationships (e.g., proximity, support, symmetry). Experiments demonstrate robust spatial reasoning on unseen layouts and arbitrary object combinations, significantly improving generalization. The approach exhibits strong potential as a foundational 3D scene model, enabling real-time, editable generation.

Technology Category

Application Category

📝 Abstract

Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited scene dataset, restricting generalization to new layouts. We instead reprogram a pre-trained 3D instance generator to act as a scene level learner, replacing dataset-bounded supervision with model-centric spatial supervision. This reprogramming unlocks the generator transferable spatial knowledge, enabling generalization to unseen layouts and novel object compositions. Remarkably, spatial reasoning still emerges even when the training scenes are randomly composed objects. This demonstrates that the generator's transferable scene prior provides a rich learning signal for inferring proximity, support, and symmetry from purely geometric cues. Replacing widely used canonical space, we instantiate this insight with a view-centric formulation of the scene space, yielding a fully feed-forward, generalizable scene generator that learns spatial relations directly from the instance model. Quantitative and qualitative results show that a 3D instance generator is an implicit spatial learner and reasoner, pointing toward foundation models for interactive 3D scene understanding and generation. Project page: https://luling06.github.io/I-Scene-project/

Problem

Research questions and friction points this paper is trying to address.

Generalizing interactive 3D scene generation to unseen layouts

Replacing dataset-bounded supervision with model-centric spatial supervision

Enabling spatial reasoning from geometric cues without canonical space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reprogram pre-trained 3D instance generator as scene learner

Replace dataset supervision with model-centric spatial supervision

Use view-centric formulation for feed-forward generalizable scene generation

🔎 Similar Papers

Unsupervised Discovery of Object-Centric Neural Fields