Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing 3D reconstruction methods struggle to balance generalization and practicality due to either inefficient per-scene optimization or reliance on category-specific training. This work proposes a feed-forward, output-representation-agnostic framework for 3D reconstruction, systematically addressing five core challenges: feature enhancement, geometry awareness, model efficiency, data augmentation, and temporal modeling. By unifying the analysis of image backbones, multi-view fusion mechanisms, and geometric priors—and integrating major datasets and evaluation benchmarks—it establishes a standardized benchmarking protocol. The study transcends differences in geometric representations, formulates a problem-driven, generalizable modeling paradigm, and outlines promising future directions in scalability, evaluation metrics, and world modeling.

Technology Category

Application Category

📝 Abstract

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

Problem

Research questions and friction points this paper is trying to address.

feed-forward 3D reconstruction

generalizable 3D modeling

3D scene representation

model design taxonomy

cross-scene generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

feed-forward 3D reconstruction

problem-driven taxonomy

geometry-aware modeling