FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-guided image editing often suffers from local detail loss and color distortion due to uniform optimization across the full frequency spectrum. To address this, we propose a frequency-aware image editing framework that introduces, for the first time, a frequency-aware denoising scoring mechanism—enabling spatially localized and band-selective editing. Our method integrates wavelet-based multi-scale decomposition, frequency-domain gradient masking, and triplane representation to achieve cross-dimensional (2D/3D) texture editing with precise frequency control. Within text-guided latent diffusion models, it enables targeted modulation of critical frequency components in user-specified regions. Quantitative evaluation and user studies demonstrate significant improvements: a 32% increase in local detail preservation and a 27% gain in color fidelity—outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Text-guided image editing using Text-to-Image (T2I) models often fails to yield satisfactory results, frequently introducing unintended modifications, such as the loss of local detail and color changes. In this paper, we analyze these failure cases and attribute them to the indiscriminate optimization across all frequency bands, even though only specific frequencies may require adjustment. To address this, we introduce a simple yet effective approach that enables the selective optimization of specific frequency bands within localized spatial regions for precise edits. Our method leverages wavelets to decompose images into different spatial resolutions across multiple frequency bands, enabling precise modifications at various levels of detail. To extend the applicability of our approach, we provide a comparative analysis of different frequency-domain techniques. Additionally, we extend our method to 3D texture editing by performing frequency decomposition on the triplane representation, enabling frequency-aware adjustments for 3D textures. Quantitative evaluations and user studies demonstrate the effectiveness of our method in producing high-quality and precise edits.
Problem

Research questions and friction points this paper is trying to address.

Text-guided image editing often causes unintended modifications
Indiscriminate optimization across all frequency bands reduces precision
Lack of frequency-aware methods for 3D texture editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-aware denoising for precise image edits
Wavelet decomposition for multi-band frequency control
Triplane-based 3D texture frequency editing
🔎 Similar Papers
No similar papers found.
Yufan Ren
Yufan Ren
EPFL
3D Perception and ReconstructionDiffusion ModelsLVLM
Zicong Jiang
Zicong Jiang
PhD student at Chalmers University of Technology
Communication SystemsOptical fiber communication and sensingMachine learningGenerative AI
T
Tong Zhang
School of Computer and Communication Sciences, EPFL
S
Soren Forchhammer
Department of Electrical and Photonics Engineering, DTU
S
Sabine Susstrunk
School of Computer and Communication Sciences, EPFL