🤖 AI Summary
This work addresses the lack of a unified, controllable framework for text-guided image-to-image (I2I) translation. We propose FCDiffusion—the first end-to-end diffusion model based on frequency-domain modulation. It applies the discrete cosine transform (DCT) in the latent space to decouple features into low-, mid-, and high-frequency components, enabling fine-grained, text-conditioned control via a learnable frequency filtering module. By simply switching frequency-control branches, FCDiffusion seamlessly supports diverse tasks—including style creation, semantic editing, scene transfer, and style translation—without task-specific architectures. Integrated with latent diffusion models (LDMs) and text cross-attention, it achieves state-of-the-art performance across multiple benchmarks, demonstrating superior generation quality, precise controllability, and strong cross-task generalization. Code and pretrained models are publicly available.
📝 Abstract
Recently, text-to-image diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing flexible image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework contributing a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which extracts image features carrying different DCT spectral bands to control the text-to-image generation process of the Latent Diffusion Model, realizing versatile I2I applications including style-guided content creation, image semantic manipulation, image scene translation, and image style translation. Different from related methods, FCDiffusion establishes a unified text-driven I2I framework suiting diverse I2I application scenarios simply by switching among different frequency control branches. The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively. Our project is publicly available at: https://xianggao1102.github.io/FCDiffusion/.