🤖 AI Summary
Existing 3D super-resolution methods rely on dense low-resolution inputs and per-scene optimization, which hinders efficient recovery of high-frequency geometric and appearance details, thereby limiting reconstruction quality, generalization, and real-time performance. This work reformulates 3D super-resolution as a feedforward mapping from sparse multi-view low-resolution images to a high-resolution 3D Gaussian Splatting (3DGS) representation, introducing an end-to-end trainable network that directly predicts the high-resolution 3DGS. By leveraging Gaussian offset learning and feature refinement modules, the method eliminates the need for pre-trained 2D super-resolution models or per-scene optimization and can be seamlessly integrated as a plug-in module with any feedforward 3DGS backbone. Experiments demonstrate that our approach outperforms state-of-the-art methods across three benchmarks, achieving exceptional zero-shot generalization—even surpassing optimization-based approaches on unseen scenes.
📝 Abstract
3D super-resolution (3DSR) aims to reconstruct high-resolution (HR) 3D scenes from low-resolution (LR) multi-view images. Existing methods rely on dense LR inputs and per-scene optimization, which restricts the high-frequency priors for constructing HR 3D Gaussian Splatting (3DGS) to those inherited from pretrained 2D super-resolution (2DSR) models. This severely limits reconstruction fidelity, cross-scene generalization, and real-time usability. We propose to reformulate 3DSR as a direct feed-forward mapping from sparse LR views to HR 3DGS representations, enabling the model to autonomously learn 3D-specific high-frequency geometry and appearance from large-scale, multi-scene data. This fundamentally changes how 3DSR acquires high-frequency knowledge and enables robust generalization to unseen scenes. Specifically, we introduce SR3R, a feed-forward framework that directly predicts HR 3DGS representations from sparse LR views via the learned mapping network. To further enhance reconstruction fidelity, we introduce Gaussian offset learning and feature refinement, which stabilize reconstruction and sharpen high-frequency details. SR3R is plug-and-play and can be paired with any feed-forward 3DGS reconstruction backbone: the backbone provides an LR 3DGS scaffold, and SR3R upscales it to an HR 3DGS. Extensive experiments across three 3D benchmarks demonstrate that SR3R surpasses state-of-the-art (SOTA) 3DSR methods and achieves strong zero-shot generalization, even outperforming SOTA per-scene optimization methods on unseen scenes.