🤖 AI Summary
Diffusion language models struggle to effectively control semantic diversity during generation, limiting their capacity for multi-path reasoning and creative expression. This work reveals, for the first time, a temporal division of labor in their denoising process: early steps predominantly shape global semantics, while later steps refine local details. Building on this insight, the authors propose Temporal Annealing Perturbation Sampling (TAPS), a training-free method that injects controllable perturbations during early denoising stages to stimulate diverse semantic branches and gradually anneals these perturbations in later stages to preserve fluency and instruction adherence. TAPS is compatible with both non-autoregressive and semi-autoregressive architectures and significantly enhances output diversity across multiple creative writing and reasoning benchmarks without compromising generation quality.
📝 Abstract
Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation, yet how this structure can be leveraged to control generation diversity for exploring multiple valid semantic or reasoning paths remains underexplored. In this paper, we show that Diffusion-LMs, like diffusion models in image generation, exhibit a temporal division of labor: early denoising steps largely determine the global semantic structure, while later steps focus on local lexical refinement. Building on this insight, we propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching early in the diffusion process while progressively reducing perturbations to preserve fluency and instruction adherence. TAPS is compatible with both non-autoregressive and semi-autoregressive Diffusion backbones, demonstrated on LLaDA and TraDo in our paper, and consistently improves output diversity across creative writing and reasoning benchmarks without compromising generation quality.