AIR: Amortized Image Reconstruction Framework for Self-Supervised Feed-Forward 2D Gaussian Splatting

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing 2D Gaussian splatting methods rely on per-image iterative optimization or handcrafted priors, resulting in high computational overhead and low efficiency. This work proposes a self-supervised feedforward framework that amortizes the iterative Gaussian fitting process into a single forward pass, eliminating the need for test-time optimization. The approach introduces a staged residual architecture with an explicit stage-control mechanism that dynamically activates new Gaussian primitives only in regions with insufficient reconstruction fidelity. A predict–optimize–distill training strategy stabilizes multi-stage learning, while an image-adaptive quantizer enhances representation efficiency. Evaluated on the Kodak and DIV2K datasets, the method achieves superior reconstruction quality compared to existing Gaussian-based baselines, with encoding times reduced to 160–300 milliseconds.

📝 Abstract

2D Gaussian splatting provides an efficient explicit representation for image reconstruction, but existing methods still require costly per-image iterative optimization or rely on handcrafted priors for primitive allocation. We present AIR, a self-supervised feed-forward framework that amortizes iterative Gaussian fitting into a single network pass, eliminating per-image test-time optimization. AIR adopts a stage-wise residual architecture that progressively predicts additional Gaussian primitives from reconstruction residuals, together with an explicit Stage Control mechanism that activates new primitives only in under-reconstructed regions. A Predict--Optimize--Distill training strategy stabilizes multi-stage prediction by distilling short-horizon optimized Gaussian increments back into the predictor. The stabilized predictor is then jointly finetuned across stages and equipped with an image-adaptive quantizer for compact Gaussian storage. Experiments on Kodak and DIV2K show that AIR achieves better reconstruction quality than representative Gaussian-based baselines while reducing encoding time to 160--300\,ms. Code: https://github.com/whoiszzj/AIR.git

Problem

Research questions and friction points this paper is trying to address.

2D Gaussian splatting

image reconstruction

per-image optimization

primitive allocation

self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

amortized inference

self-supervised learning

Gaussian splatting