Planner Aware Path Learning in Diffusion Language Models Training

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models suffer from a path mismatch between uniform random denoising during training and non-uniform, planner-guided denoising schedules during inference, rendering the standard ELBO inadequate for characterizing actual generation performance. To address this, we propose a planner-aware training framework that explicitly incorporates the planning policy into the learning objective. Specifically, we introduce the Planning-aware Evidence Lower Bound (P-ELBO) and design a planner-aware reverse process, enabling dynamic alignment between training and non-uniform inference trajectories. Our approach operates within a discrete diffusion architecture, jointly integrating mask modeling with learnable, adaptive planning paths to support flexible parallel generation. Empirical evaluation demonstrates substantial improvements over strong baselines: +4× MAUVE gain in text generation, +23% HumanEval pass@10 in code generation, and superior performance in protein sequence modeling. This work establishes the first principled framework for aligning diffusion training with structured, non-uniform inference policies.

Technology Category

Application Category

📝 Abstract
Diffusion language models have emerged as a powerful alternative to autoregressive models, enabling fast inference through flexible and parallel generation paths. This flexibility is enabled by new sampling strategies, or planners, that iteratively choose where to denoise along the sequence rather than sampling uniformly at random. However, by modifying reverse paths, planners introduce a mismatch between the uniformly random denoising paths used during training and the planning-based paths used at inference. In this work, we systematically investigate this mismatch and theoretically show that the standard discrete diffusion training evidence lower bound (ELBO) does not accurately describe a denoiser under non-uniform planning. To bridge this gap, we derive a new Planned Evidence Lower Bound (P-ELBO) that directly incorporates planner-based reverse dynamics into the training objective. Building on this, we propose Planner Aware Path Learning (PAPL), a simple and effective modification of the standard masked discrete diffusion loss that aligns training and inference under planned denoisers. Empirically, PAPL delivers consistent improvements across domains, including a 40% relative gain in protein sequence modeling, up to a 4x improvement in MAUVE for text generation, and a 23% relative gain in HumanEval pass@10 for code generation.
Problem

Research questions and friction points this paper is trying to address.

Addresses training-inference mismatch in diffusion language models
Incorporates planner-based reverse dynamics into training objective
Aligns training paths with planned denoising during inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives Planned ELBO for non-uniform planning
Proposes Planner Aware Path Learning method
Aligns training and inference under planned denoisers
🔎 Similar Papers
No similar papers found.