DVSM: Decoder-only View Synthesis Model Done Right

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the redundancy and low parameter efficiency inherent in encoder-decoder architectures of existing large-scale view synthesis models by proposing the first decoder-only framework for view synthesis. The method implicitly represents scenes as a KV cache and aligns features across reconstruction and rendering tasks through weight sharing. It further integrates foundational model priors with a staged block-size strategy to enhance both efficiency and fidelity. The resulting architecture achieves significantly improved parameter efficiency and rendering quality, establishing new state-of-the-art performance across multiple benchmarks. Notably, under dense-view settings, it even surpasses scene-specific optimized 3D Gaussian Splatting (3DGS) methods.
📝 Abstract
Recent Large View Synthesis Models (LVSMs) advocate an encoder-decoder architecture that separates reconstruction and rendering into distinct networks. We re-examine this design. Through controlled experiments, we show that a decoder-only architecture, which represents scenes implicitly as a KV-cache, outperforms encoder-decoder variants while using fewer parameters at identical rendering complexity. Further analysis shows that sharing weights between the color-input reconstruction network and the camera-only rendering network better aligns their features at the same viewpoint, facilitating image synthesis. Building on this finding, our model, dubbed DVSM, further incorporates foundation model priors and stage-wise patch sizing for an improved efficiency-quality tradeoff. Our results establish a new state of the art for novel-view synthesis across multiple benchmarks, in some cases even outperforming per-scene-optimized 3DGS under dense input views.
Problem

Research questions and friction points this paper is trying to address.

View Synthesis
Encoder-Decoder Architecture
Decoder-only Model
Novel-view Synthesis
Scene Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

decoder-only architecture
KV-cache scene representation
weight sharing
foundation model priors
stage-wise patch sizing
🔎 Similar Papers