Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images

📅 2025-03-20
🏛️ Neural Information Processing Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing feedforward multi-view Gaussian generation methods naively concatenate Gaussians across views, leading to geometric inconsistencies, rendering artifacts, and representational redundancy. To address this, we propose the Gaussian Graph Neural Network (G-GNN), the first approach to model multi-view Gaussians as a geometry-aware graph structure. G-GNN introduces a Gaussian-level message-passing mechanism to explicitly capture inter-view geometric relationships, incorporates differentiable Gaussian pooling for compact representation learning, and jointly enforces multi-view geometric constraints within an end-to-end trainable differentiable rendering framework. Evaluated on RealEstate10K and ACID, our method achieves superior PSNR and SSIM with significantly fewer Gaussians, faster rendering speed, and strong cross-scene generalization—outperforming state-of-the-art methods across all metrics.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D Gaussian representation from multi-view images
Reduces artifacts and memory cost in scene synthesis
Enhances rendering speed and image quality efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Graph Network models multi-view relations
Reformulates graph operations for Gaussian feature fusion
Gaussian pooling layer aggregates groups efficiently
🔎 Similar Papers
No similar papers found.
S
Shengjun Zhang
Tsinghua University
Xin Fei
Xin Fei
National University of Singapore
Robotic ManipulationComputer Vision
Fangfu Liu
Fangfu Liu
Tsinghua University
Computer Vision3D VisionMachine Learning
H
Haixu Song
Tsinghua University
Y
Yueqi Duan
Tsinghua University