🤖 AI Summary
Current generative diffusion models for image editing support prompt- and semantics-guided manipulation but lack pixel-level precision and are typically constrained to single-task scenarios. To address this, we propose a unified editing framework grounded in an intrinsic-image latent space: the first method to integrate exact diffusion inversion with disentangled intrinsic channels—such as diffuse reflectance, specular reflectance, and surface normals—within an RGB-X diffusion architecture. Our approach enables targeted latent-space editing without fine-tuning or auxiliary data. By operating on physically grounded intrinsic representations, it inherently preserves global illumination consistency and object identity. The framework supports diverse operations including color/texture editing, object insertion/deletion, relighting, and composite edits. Extensive evaluation demonstrates state-of-the-art performance on complex images, achieving superior fidelity, precise controllability, and seamless multi-task compatibility.
📝 Abstract
Generative diffusion models have advanced image editing with high-quality results and intuitive interfaces such as prompts and semantic drawing. However, these interfaces lack precise control, and the associated methods typically specialize on a single editing task. We introduce a versatile, generative workflow that operates in an intrinsic-image latent space, enabling semantic, local manipulation with pixel precision for a range of editing operations. Building atop the RGB-X diffusion framework, we address key challenges of identity preservation and intrinsic-channel entanglement. By incorporating exact diffusion inversion and disentangled channel manipulation, we enable precise, efficient editing with automatic resolution of global illumination effects -- all without additional data collection or model fine-tuning. We demonstrate state-of-the-art performance across a variety of tasks on complex images, including color and texture adjustments, object insertion and removal, global relighting, and their combinations.