CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address unpaired cross-domain image translation, this paper proposes a diffusion-based cycle learning framework that jointly optimizes diffusion denoising and image translation to mitigate the local optima problem inherent in conventional approaches. Its key contributions are: (1) a time-dependent translation network that dynamically aligns diffusion timesteps with domain mappings; and (2) a diffusion-based clean-signal extraction mechanism that end-to-end disentangles structural and textural components. Integrated with cycle-consistency constraints and unpaired adversarial training, the model achieves state-of-the-art performance on bidirectional multimodal translation tasks—including RGB ↔ edge, semantic, and depth domains—yielding outputs with both high fidelity and strong structural consistency.

Technology Category

Application Category

📝 Abstract
We introduce a diffusion-based cross-domain image translator in the absence of paired training data. Unlike GAN-based methods, our approach integrates diffusion models to learn the image translation process, allowing for more coverable modeling of the data distribution and performance improvement of the cross-domain translation. However, incorporating the translation process within the diffusion process is still challenging since the two processes are not aligned exactly, i.e., the diffusion process is applied to the noisy signal while the translation process is conducted on the clean signal. As a result, recent diffusion-based studies employ separate training or shallow integration to learn the two processes, yet this may cause the local minimal of the translation optimization, constraining the effectiveness of diffusion models. To address the problem, we propose a novel joint learning framework that aligns the diffusion and the translation process, thereby improving the global optimality. Specifically, we propose to extract the image components with diffusion models to represent the clean signal and employ the translation process with the image components, enabling an end-to-end joint learning manner. On the other hand, we introduce a time-dependent translation network to learn the complex translation mapping, resulting in effective translation learning and significant performance improvement. Benefiting from the design of joint learning, our method enables global optimization of both processes, enhancing the optimality and achieving improved fidelity and structural consistency. We have conducted extensive experiments on RGB$leftrightarrow$RGB and diverse cross-modality translation tasks including RGB$leftrightarrow$Edge, RGB$leftrightarrow$Semantics and RGB$leftrightarrow$Depth, showcasing better generative performances than the state of the arts.
Problem

Research questions and friction points this paper is trying to address.

Unpaired image translation without paired training data
Aligning diffusion and translation processes for global optimization
Improving cross-domain translation fidelity and structural consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint learning aligns diffusion and translation processes
Time-dependent network learns complex translation mapping
Extracts image components for end-to-end optimization
🔎 Similar Papers
No similar papers found.
S
Shilong Zou
School of Computer, National University of Defense Technology, Changsha, 410073, China
Yuhang Huang
Yuhang Huang
National University of Defense Technology
Deep LearningComputer Vision
Renjiao Yi
Renjiao Yi
National University of Defense Technology
Computer Graphics3D Vision
C
Chenyang Zhu
School of Computer, National University of Defense Technology, Changsha, 410073, China
K
Kai Xu
School of Computer, National University of Defense Technology, Changsha, 410073, China