🤖 AI Summary
To address the challenge that 3D Gaussian Splatting (3DGS) trained on low-resolution (LR) images struggles to support high-resolution (HR) rendering, this paper introduces the first multi-view consistent 3DGS super-resolution (SR) framework. Unlike single-image SR methods—which lack cross-view consistency—and video SR approaches—which rely on strict temporal ordering—our method supports arbitrary, unstructured multi-view inputs. Our core innovations are: (1) an epipolar-constrained multi-view attention mechanism that explicitly enforces geometric consistency across views; and (2) a pose-driven auxiliary view selection strategy that adaptively fuses complementary viewpoint information. Evaluated on both object-level and scene-level 3DGS SR benchmarks, our method achieves state-of-the-art performance, significantly improving high-frequency detail fidelity and inter-view geometric consistency.
📝 Abstract
Scenes reconstructed by 3D Gaussian Splatting (3DGS) trained on low-resolution (LR) images are unsuitable for high-resolution (HR) rendering. Consequently, a 3DGS super-resolution (SR) method is needed to bridge LR inputs and HR rendering. Early 3DGS SR methods rely on single-image SR networks, which lack cross-view consistency and fail to fuse complementary information across views. More recent video-based SR approaches attempt to address this limitation but require strictly sequential frames, limiting their applicability to unstructured multi-view datasets. In this work, we introduce Multi-View Consistent 3D Gaussian Splatting Super-Resolution (MVGSR), a framework that focuses on integrating multi-view information for 3DGS rendering with high-frequency details and enhanced consistency. We first propose an Auxiliary View Selection Method based on camera poses, making our method adaptable for arbitrarily organized multi-view datasets without the need of temporal continuity or data reordering. Furthermore, we introduce, for the first time, an epipolar-constrained multi-view attention mechanism into 3DGS SR, which serves as the core of our proposed multi-view SR network. This design enables the model to selectively aggregate consistent information from auxiliary views, enhancing the geometric consistency and detail fidelity of 3DGS representations. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both object-centric and scene-level 3DGS SR benchmarks.