ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Feed-forward 3D Gaussian Splatting (3DGS) suffers from encoder capacity bottlenecks in multi-view novel view synthesis, leading to degraded performance or excessive memory consumption during viewpoint scaling. To address this, we propose ZPressor—a lightweight, architecture-agnostic compression module that pioneers the application of the information bottleneck principle to feed-forward 3DGS. ZPressor partitions input views into anchor and auxiliary sets and employs cross-attention to compress multi-view observations into a compact, semantically preserved latent variable ( Z ), enabling lossy yet fidelity-aware encoding. Our method enables real-time inference with over 100 × 480p input views on an 80GB GPU. Evaluated on DL3DV-10K and RealEstate10K, ZPressor significantly improves reconstruction accuracy under medium-view settings and enhances robustness under dense-view conditions. This work establishes a new paradigm for scalable, feed-forward 3DGS by decoupling representation capacity from input viewpoint count.

Technology Category

Application Category

📝 Abstract

Feed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their encoders, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state $Z$ that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state $Z$. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K. The video results, code and trained models are available on our project page: https://lhmd.top/zpressor.

Problem

Research questions and friction points this paper is trying to address.

Addresses scalability limits in feed-forward 3DGS models

Reduces memory consumption with multi-view input compression

Enhances performance and robustness in dense view settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight module for multi-view input compression

Cross attention to compress support views into anchors

Enables scaling to 100+ views on 80GB GPU

🔎 Similar Papers

No similar papers found.

Authors to Follow