Guided Transfer Learning for Discrete Diffusion Models

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discrete diffusion models achieve strong performance but suffer from costly fine-tuning for cross-domain adaptation, hindering deployment in new domains. This paper proposes a zero-fine-tuning guided transfer framework. First, it enables fine-tuning-free cross-domain transfer of pre-trained discrete diffusion models—marking the first such approach. Second, it introduces a unified ratio-based guidance mechanism compatible with both score matching and denoising objectives, operating seamlessly across discrete and continuous time. Third, it designs a planner-driven sparse position-candidate token joint guidance sampling strategy, substantially reducing FLOPs and latency for large vocabularies (>50K) and long sequences (>1024). Extensive evaluation on synthetic Markov chain modeling and real-world language modeling tasks demonstrates that our method matches the generation quality of full-guidance baselines while achieving significant sampling efficiency gains.

Technology Category

Application Category

📝 Abstract
Discrete diffusion models achieve strong performance across language and other discrete domains, providing a powerful alternative to autoregressive models. However, their strong performance relies on large training datasets, which are costly or risky to obtain, especially when adapting to new domains. Transfer learning is the natural way to adapt pretrained discrete diffusion models, but current methods require fine-tuning large diffusion models, which is computationally expensive and often impractical. Building on ratio-based transfer learning for continuous diffusion, we provide Guided Transfer Learning for discrete diffusion models (GTL). This enables sampling from a target distribution without modifying the pretrained denoiser. The same guidance formulation applies to both discrete-time diffusion and continuous-time score-based discrete diffusion, yielding a unified treatment. Guided discrete diffusion often requires many forward passes of the guidance network, which becomes impractical for large vocabularies and long sequences. To address this, we further present an efficient guided sampler that concentrates evaluations on planner-selected positions and top candidate tokens, thus lowering sampling time and computation. This makes guided language modeling practical at scale for large vocabularies and long sequences. We evaluate GTL on sequential data, including synthetic Markov chains and language modeling, and provide empirical analyses of its behavior.
Problem

Research questions and friction points this paper is trying to address.

Adapting pretrained discrete diffusion models to new domains without fine-tuning
Reducing computational cost of guided sampling for large vocabularies and sequences
Enabling practical transfer learning for discrete diffusion in language modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guided Transfer Learning without fine-tuning denoiser
Unified guidance for discrete-time and continuous-time diffusion
Efficient sampler focusing on key positions and tokens
🔎 Similar Papers
No similar papers found.