π€ AI Summary
Feed-forward 3D Gaussian Splatting (3DGS) suffers from encoder capacity bottlenecks in multi-view novel view synthesis, leading to degraded performance or excessive memory consumption during viewpoint scaling. To address this, we propose ZPressorβa lightweight, architecture-agnostic compression module that pioneers the application of the information bottleneck principle to feed-forward 3DGS. ZPressor partitions input views into anchor and auxiliary sets and employs cross-attention to compress multi-view observations into a compact, semantically preserved latent variable ( Z ), enabling lossy yet fidelity-aware encoding. Our method enables real-time inference with over 100 Γ 480p input views on an 80GB GPU. Evaluated on DL3DV-10K and RealEstate10K, ZPressor significantly improves reconstruction accuracy under medium-view settings and enhances robustness under dense-view conditions. This work establishes a new paradigm for scalable, feed-forward 3DGS by decoupling representation capacity from input viewpoint count.
π Abstract
Feed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their encoders, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state $Z$ that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state $Z$. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K. The video results, code and trained models are available on our project page: https://lhmd.top/zpressor.