🤖 AI Summary
This work addresses the challenge of simultaneously achieving high-fidelity 3D reconstruction and structurally plausible generation under sparse-view conditions. The authors propose a unified framework that, for the first time, cohesively integrates feed-forward reconstruction and diffusion-based generation across coordinate space, 3D representation, and training objectives. Key innovations include a shared canonical space alignment, decoupled collaborative learning, and an implicit geometric anchor-guided latent space enhancement mechanism. These components jointly enable efficient structural completion and consistent multi-view synthesis. The method significantly outperforms existing approaches under sparse observational settings, setting new state-of-the-art results in both reconstruction fidelity and robustness.
📝 Abstract
Sparse-view 3D modeling represents a fundamental tension between reconstruction fidelity and generative plausibility. While feed-forward reconstruction excels in efficiency and input alignment, it often lacks the global priors needed for structural completeness. Conversely, diffusion-based generation provides rich geometric details but struggles with multi-view consistency. We present UniRecGen, a unified framework that integrates these two paradigms into a single cooperative system. To overcome inherent conflicts in coordinate spaces, 3D representations, and training objectives, we align both models within a shared canonical space. We employ disentangled cooperative learning, which maintains stable training while enabling seamless collaboration during inference. Specifically, the reconstruction module is adapted to provide canonical geometric anchors, while the diffusion generator leverages latent-augmented conditioning to refine and complete the geometric structure. Experimental results demonstrate that UniRecGen achieves superior fidelity and robustness, outperforming existing methods in creating complete and consistent 3D models from sparse observations.