🤖 AI Summary
3D Gaussian Splatting (3DGS) excels in novel view synthesis but suffers from multi-view super-resolution (SR) inconsistency when generating high-resolution renderings from low-resolution (LR) inputs, resulting in blurring and geometric distortion. Existing SR approaches apply uniform upscaling across all views, compromising geometric consistency and high-frequency detail recovery. To address this, we propose a Selective SR framework that identifies undersampled regions—such as oblique viewpoints and sparsely observed areas—based on camera poses and scene geometry, and conditionally enhances only their local high-frequency content. Our method tightly integrates 3D Gaussian point representations, multi-view geometric analysis, and a conditional SR network to achieve viewpoint-consistent detail synthesis. Extensive experiments on Tanks & Temples, Deep Blending, and Mip-NeRF 360 demonstrate significant improvements over baselines, particularly in perceptual quality and geometric fidelity of foreground details.
📝 Abstract
3D Gaussian Splatting (3DGS) enables high-quality novel view synthesis, motivating interest in generating higher-resolution renders than those available during training. A natural strategy is to apply super-resolution (SR) to low-resolution (LR) input views, but independently enhancing each image introduces multi-view inconsistencies, leading to blurry renders. Prior methods attempt to mitigate these inconsistencies through learned neural components, temporally consistent video priors, or joint optimization on LR and SR views, but all uniformly apply SR across every image. In contrast, our key insight is that close-up LR views may contain high-frequency information for regions also captured in more distant views, and that we can use the camera pose relative to scene geometry to inform where to add SR content. Building from this insight, we propose SplatSuRe, a method that selectively applies SR content only in undersampled regions lacking high-frequency supervision, yielding sharper and more consistent results. Across Tanks&Temples, Deep Blending and Mip-NeRF 360, our approach surpasses baselines in both fidelity and perceptual quality. Notably, our gains are most significant in localized foreground regions where higher detail is desired.