🤖 AI Summary
To address inherent polar distortion and boundary seam artifacts in equirectangular projection (ERP) representations of HDR environment maps estimated from single-view images, this paper proposes a panoramic lighting reconstruction method based on latent diffusion models. Our approach introduces three key innovations: (1) an ERP-aware convolutional padding module that explicitly models spherical topology to eliminate seam artifacts; (2) PanoDiT, a panoramic diffusion transformer architecture incorporating spherical-aware positional encoding and pole-adaptive attention to mitigate polar distortion; and (3) joint optimization of a latent autoencoder and an ERP-adapted diffusion prior. Evaluated on standard benchmarks, our method achieves state-of-the-art performance in both visual quality and specular reflection fidelity of reconstructed HDR environment maps.
📝 Abstract
We advance the field of HDR environment map estimation from a single-view image by establishing a novel approach leveraging the Latent Diffusion Model (LDM) to produce high-quality environment maps that can plausibly light mirror-reflective surfaces. A common issue when using the ERP representation, the format used by the vast majority of approaches, is distortions at the poles and a seam at the sides of the environment map. We remove the border seam artefact by proposing an ERP convolutional padding in the latent autoencoder. Additionally, we investigate whether adapting the diffusion network architecture to the ERP format can improve the quality and accuracy of the estimated environment map by proposing a panoramically-adapted Diffusion Transformer architecture. Our proposed PanoDiT network reduces ERP distortions and artefacts, but at the cost of image quality and plausibility. We evaluate with standard benchmarks to demonstrate that our models estimate high-quality environment maps that perform competitively with state-of-the-art approaches in both image quality and lighting accuracy.