AIR: Amortized Image Reconstruction Framework for Self-Supervised Feed-Forward 2D Gaussian Splatting

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
Existing 2D Gaussian splatting methods rely on per-image iterative optimization or handcrafted priors, resulting in high computational overhead and low efficiency. This work proposes a self-supervised feedforward framework that amortizes the iterative Gaussian fitting process into a single forward pass, eliminating the need for test-time optimization. The approach introduces a staged residual architecture with an explicit stage-control mechanism that dynamically activates new Gaussian primitives only in regions with insufficient reconstruction fidelity. A predict–optimize–distill training strategy stabilizes multi-stage learning, while an image-adaptive quantizer enhances representation efficiency. Evaluated on the Kodak and DIV2K datasets, the method achieves superior reconstruction quality compared to existing Gaussian-based baselines, with encoding times reduced to 160–300 milliseconds.
📝 Abstract
2D Gaussian splatting provides an efficient explicit representation for image reconstruction, but existing methods still require costly per-image iterative optimization or rely on handcrafted priors for primitive allocation. We present AIR, a self-supervised feed-forward framework that amortizes iterative Gaussian fitting into a single network pass, eliminating per-image test-time optimization. AIR adopts a stage-wise residual architecture that progressively predicts additional Gaussian primitives from reconstruction residuals, together with an explicit Stage Control mechanism that activates new primitives only in under-reconstructed regions. A Predict--Optimize--Distill training strategy stabilizes multi-stage prediction by distilling short-horizon optimized Gaussian increments back into the predictor. The stabilized predictor is then jointly finetuned across stages and equipped with an image-adaptive quantizer for compact Gaussian storage. Experiments on Kodak and DIV2K show that AIR achieves better reconstruction quality than representative Gaussian-based baselines while reducing encoding time to 160--300\,ms. Code: https://github.com/whoiszzj/AIR.git
Problem

Research questions and friction points this paper is trying to address.

2D Gaussian splatting
image reconstruction
per-image optimization
primitive allocation
self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

amortized inference
self-supervised learning
Gaussian splatting
feed-forward reconstruction
stage-wise residual architecture
🔎 Similar Papers
No similar papers found.