🤖 AI Summary
This paper addresses the challenge of strong coupling among appearance, geometry, and illumination in image-based material replacement. We propose an illumination- and geometry-aware end-to-end material transfer method that requires no UV mapping, 3D reconstruction, or manual annotations—only a flat material image and the input scene image. Our approach synthesizes photorealistic material appearances consistent with scene lighting, shadows, and viewpoint. The key innovation is the first integration of illumination and geometric priors into a diffusion model: we fine-tune Stable Diffusion on synthetically generated material–scene pairs to implicitly encode geometric and illumination constraints in the latent space. The method is fully unsupervised, eliminating reliance on text prompts or explicit 3D parameters. Quantitative evaluation shows >12% improvement in PSNR/SSIM over state-of-the-art methods on both synthetic and real-world images. Moreover, it enables zero-shot transfer of arbitrary materials while preserving high scene consistency qualitatively.
📝 Abstract
We present MatSwap, a method to transfer materials to designated surfaces in an image photorealistically. Such a task is non-trivial due to the large entanglement of material appearance, geometry, and lighting in a photograph. In the literature, material editing methods typically rely on either cumbersome text engineering or extensive manual annotations requiring artist knowledge and 3D scene properties that are impractical to obtain. In contrast, we propose to directly learn the relationship between the input material -- as observed on a flat surface -- and its appearance within the scene, without the need for explicit UV mapping. To achieve this, we rely on a custom light- and geometry-aware diffusion model. We fine-tune a large-scale pre-trained text-to-image model for material transfer using our synthetic dataset, preserving its strong priors to ensure effective generalization to real images. As a result, our method seamlessly integrates a desired material into the target location in the photograph while retaining the identity of the scene. We evaluate our method on synthetic and real images and show that it compares favorably to recent work both qualitatively and quantitatively. We will release our code and data upon publication.