Let Triggers Control: Frequency-Aware Dropout for Effective Token Control

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of semantic entanglement in personalized text-to-image generation, where a single trigger token often co-occurs with high-frequency contextual words, thereby weakening prompt controllability. To mitigate this, the authors propose Frequency-Aware Dropout (FAD), a novel approach that introduces a co-occurrence frequency–based dynamic regularization mechanism into trigger-token control. Integrated within LoRA fine-tuning and guided by a curriculum learning schedule, FAD dynamically masks high-frequency contextual tokens to disentangle semantics. Notably, this method enhances the semantic independence and controllability of trigger tokens without modifying model architecture or introducing additional parameters. Extensive experiments demonstrate that FAD consistently improves prompt fidelity, style accuracy, and user-perceived quality across multiple models—including Stable Diffusion 1.5, SDXL, FLUX, and Qwen-Image—while incurring minimal computational overhead.
📝 Abstract
Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previous works have naively assumed that fine-tuning with a single trigger token to represent new concepts. However, this often results in poor controllability, where the trigger token alone fails to reliably evoke the intended concept. We attribute this issue to the frequent co-occurrence of the trigger token with the surrounding context during fine-tuning, which entangles their representations and compromises the token's semantic distinctiveness. To disentangle this, we propose Frequency-Aware Dropout (FAD) -- a novel regularization technique that improves prompt controllability without adding new parameters. FAD consists of two key components: co-occurrence analysis and curriculum-inspired scheduling. Qualitative and quantitative analyses across token-based diffusion models (SD~1.5 and SDXL) and natural language--driven backbones (FLUX and Qwen-Image) demonstrate consistent gains in prompt fidelity, stylistic precision, and user-perceived quality. Our method provides a simple yet effective dropout strategy that enhances controllability and personalization in text-to-image generation. Notably, it achieves these improvements without introducing additional parameters or architectural modifications, making it readily applicable to existing models with minimal computational overhead.
Problem

Research questions and friction points this paper is trying to address.

trigger token
controllability
text-to-image generation
personalization
semantic entanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Aware Dropout
trigger token
controllability
text-to-image generation
representation disentanglement
🔎 Similar Papers
No similar papers found.