Appearance Harmonization via Bilateral Grid Prediction with Transformers for 3DGS

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Photometric inconsistencies across multi-view images—introduced by on-device camera pipeline operations (e.g., exposure adjustment, white balance)—degrade novel view synthesis quality. Existing joint optimization approaches for scene representation and appearance embedding suffer from high computational cost and poor generalization. This paper proposes a Transformer-based bilateral grid prediction method, the first to incorporate Transformers into spatially adaptive bilateral grid modeling. Our approach enables zero-shot, cross-scene photometric consistency correction without retraining and integrates seamlessly into the 3D Gaussian Splatting framework. It preserves high-fidelity reconstruction while significantly improving training efficiency. Quantitative and qualitative evaluations across multiple datasets demonstrate that our method achieves reconstruction fidelity on par with or superior to state-of-the-art scene-specific optimization methods, with notably faster convergence.

Technology Category

Application Category

📝 Abstract

Modern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade the quality of novel view synthesis. Joint optimization of scene representations and per-image appearance embeddings has been proposed to address this issue, but at the cost of increased computational complexity and slower training. In this work, we propose a transformer-based method that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner, enabling robust cross-scene generalization without the need for scene-specific retraining. By incorporating the learned grids into the 3D Gaussian Splatting pipeline, we improve reconstruction quality while maintaining high training efficiency. Extensive experiments show that our approach outperforms or matches existing scene-specific optimization methods in reconstruction fidelity and convergence speed.

Problem

Research questions and friction points this paper is trying to address.

Address photometric inconsistencies across views in 3DGS

Reduce computational complexity of appearance harmonization

Improve reconstruction quality without scene-specific retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based bilateral grid prediction

Multi-view consistent photometric correction

Efficient 3D Gaussian Splatting integration

🔎 Similar Papers

GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers