🤖 AI Summary
To address the challenge of efficient fine-tuning of diffusion models on novel tasks, this paper proposes a lightweight adaptation framework based on dynamic frequency-domain energy modeling. Methodologically, it first uncovers the evolutionary规律 of frequency-domain energy during the diffusion denoising process; then introduces a soft frequency router coupled with a multi-expert adapter fusion mechanism, and enforces frequency-domain energy consistency regularization to enable band-adaptive parameter updates in latent space. The framework supports dynamic frequency routing during both training and inference, significantly enhancing generalization across structural and resolution-diverse tasks. Experiments demonstrate stable convergence across diverse diffusion models (e.g., SDXL, SVD) and resolutions, with substantial improvements in generation quality, accelerated convergence (>30% faster), and seamless compatibility with mainstream fine-tuning paradigms—including LoRA and IA³—thereby ensuring strong versatility and deployment flexibility.
📝 Abstract
Diffusion models have achieved remarkable success in generative modeling, yet how to effectively adapt large pretrained models to new tasks remains challenging. We revisit the reconstruction behavior of diffusion models during denoising to unveil the underlying frequency energy mechanism governing this process. Building upon this observation, we propose FeRA, a frequency driven fine tuning framework that aligns parameter updates with the intrinsic frequency energy progression of diffusion. FeRA establishes a comprehensive frequency energy framework for effective diffusion adaptation fine tuning, comprising three synergistic components: (i) a compact frequency energy indicator that characterizes the latent bandwise energy distribution, (ii) a soft frequency router that adaptively fuses multiple frequency specific adapter experts, and (iii) a frequency energy consistency regularization that stabilizes diffusion optimization and ensures coherent adaptation across bands. Routing operates in both training and inference, with inference time routing dynamically determined by the latent frequency energy. It integrates seamlessly with adapter based tuning schemes and generalizes well across diffusion backbones and resolutions. By aligning adaptation with the frequency energy mechanism, FeRA provides a simple, stable, and compatible paradigm for effective and robust diffusion model adaptation.