Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Diffusion large language models (dLLMs) fine-tuned with instruction data suffer from “<eos> overflow”: generated sequences prematurely terminate with repeated <eos> tokens as length increases, degrading output quality. This stems from the dual role of <eos> as both end-of-sequence token and padding symbol, causing excessive probability mass concentration in the output distribution. This work is the first to systematically characterize this mechanism and proposes “Rainbow Padding”—replacing the single <eos> padding token with a periodic sequence of dedicated padding tokens—to disrupt gradient vanishing and mitigate premature termination during backpropagation. The method requires only one round of LoRA-based fine-tuning, introduces no architectural changes, and maintains full compatibility with existing dLLM systems. Experiments demonstrate that merely seven distinct padding tokens substantially improve robustness to generation length, achieving strong performance even with minimal training data. Code is publicly available.

Technology Category

Application Category

📝 Abstract

Diffusion large language models (dLLMs) have emerged as a promising alternative to autoregressive models, offering flexible generation orders and strong performance on complex reasoning tasks. However, instruction-tuned dLLMs exhibit a critical vulnerability we term exttt{<eos>} overflow: as allocated sequence length increases, responses paradoxically become shorter, collapsing into early termination or degenerating into streams of exttt{<eos>} tokens. Although noticed in practice, this issue has not been systematically analyzed. We trace its root cause to the dual role of exttt{<eos>} as both termination and padding, which concentrates probability mass on exttt{<eos>} at later positions and propagates backward to trigger early termination. To address this, we introduce Rainbow Padding, a simple remedy that replaces repeated exttt{<eos>} placeholders with a repeating cycle of distinct padding tokens, distributing probability mass and breaking exttt{<eos>} dominance. Experiments show that Rainbow Padding substantially improves length robustness and output quality, with as few as seven padding tokens sufficient to prevent early termination. Moreover, the method integrates efficiently into existing instruction-tuned models: LoRA fine-tuning for a single epoch on minimal data yields significant improvements, making this solution highly practical. The code is publicly available at https://github.com/quasar529/rainbow-padding.

Problem

Research questions and friction points this paper is trying to address.

Instruction-tuned diffusion LLMs suffer from early termination vulnerability

EOS token dual role causes probability concentration and response collapse

Rainbow Padding method distributes probability to prevent premature termination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rainbow Padding replaces repeated EOS with distinct tokens

It distributes probability mass to prevent early termination

LoRA fine-tuning efficiently integrates this into existing models

🔎 Similar Papers

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention