🤖 AI Summary
Existing feedforward multi-view Gaussian generation methods naively concatenate Gaussians across views, leading to geometric inconsistencies, rendering artifacts, and representational redundancy. To address this, we propose the Gaussian Graph Neural Network (G-GNN), the first approach to model multi-view Gaussians as a geometry-aware graph structure. G-GNN introduces a Gaussian-level message-passing mechanism to explicitly capture inter-view geometric relationships, incorporates differentiable Gaussian pooling for compact representation learning, and jointly enforces multi-view geometric constraints within an end-to-end trainable differentiable rendering framework. Evaluated on RealEstate10K and ACID, our method achieves superior PSNR and SSIM with significantly fewer Gaussians, faster rendering speed, and strong cross-scene generalization—outperforming state-of-the-art methods across all metrics.
📝 Abstract
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.