π€ AI Summary
Existing methods model forward and inverse rendering as separate multi-step diffusion processes, leading to cycle inconsistency, slow inference, and temporal artifacts in video generation. To address these issues, this paper proposes Dual-One-Step Diffusionβa novel framework that unifies single-step diffusion for jointly learning cycle-consistent forward rendering and intrinsic decomposition. By sharing a latent space and enforcing bidirectional gradient coupling, the two tasks mutually enhance each other, while an explicit cycle-consistency constraint further ensures fidelity. The framework achieves high-quality intrinsic decomposition across both indoor and outdoor scenes and generalizes zero-shot to video decomposition without fine-tuning. Experiments demonstrate state-of-the-art rendering quality, 3β5Γ faster inference than prevailing multi-step diffusion models, and significantly reduced inter-frame inconsistency in video sequences.
π Abstract
While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.