Learnability-Informed Fine-Tuning of Diffusion Language Models

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Standard supervised fine-tuning (SFT) underperforms in diffusion language models because it neglects the varying learnability of tokens across different diffusion timesteps. This work proposes LIFT, the first approach to incorporate token learnability into the fine-tuning of diffusion language models. LIFT dynamically schedules learning objectives based on the input masking ratio: during high-masking phases, it prioritizes easily predictable tokens, while shifting focus to harder-to-predict tokens as masking decreases, thereby aligning training targets with the information availability inherent in the diffusion process. As a timestep-aware adaptive SFT strategy, LIFT significantly outperforms existing methods across six reasoning benchmarks, achieving up to a threefold relative performance gain on AIME'24 and AIME'25.

📝 Abstract

We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the underlying causes remain understudied. Our analysis reveals that vanilla SFT overlooks learnability, namely what and when tokens are learned. Specifically, rare tokens are difficult to learn when most of the input is masked, whereas it is straightforward and thus of little value to learn common tokens when most of the input is unmasked. Motivated by our analysis, we propose LIFT, an efficient SFT-based post-training algorithm for DLMs. LIFT learns easy tokens when most of the input is masked and hard tokens when more context is available, thus aligning the training with the information available at different diffusion time steps. Our results show that LIFT outperforms existing SFT baselines across six reasoning benchmarks, achieving up to a 3x relative gain on AIME'24 and AIME'25. Our code is publicly available at https://github.com/divelab/LIFT.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

supervised fine-tuning

learnability

token learning

reasoning capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models

supervised fine-tuning

learnability