Lookahead Path Likelihood Optimization for Diffusion LLMs

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limitation of diffusion-based large language models (dLLMs), whose reasoning performance is constrained by the demasking order, as existing heuristic strategies—focused solely on local confidence—fail to ensure global consistency. To overcome this, the authors propose Path Log-Likelihood (Path LL) as a path-level objective function strongly correlated with downstream accuracy. They introduce the POKE estimator to prospectively predict the future Path LL of partial trajectories and integrate it within a Sequential Monte Carlo (SMC) framework, forming POKE-SMC for dynamic optimal path search. This approach transcends the limitations of conventional greedy strategies, achieving average accuracy gains of 2%–3% across six reasoning tasks and significantly advancing the accuracy-computation Pareto frontier on the LLaDA model with comparable computational cost.

Technology Category

Application Category

📝 Abstract

Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics that greedily optimize local confidence, offering limited guidance for identifying unmasking paths that are globally consistent and accurate. To bridge this gap, we introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths. To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory. We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths. Extensive experiments across 6 reasoning tasks show that POKE-SMC consistently improves accuracy, achieving 2%--3% average gains over strong decoding-time scaling baselines at comparable inference overhead on LLaDA models and advancing the accuracy--compute Pareto frontier.

Problem

Research questions and friction points this paper is trying to address.

Diffusion LLMs

unmasking order

inference performance

decoding trajectory

global consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Path Log-Likelihood

Diffusion LLMs

POKE