Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the challenge that existing test-time scaling methods, which rely on autoregressive decoding, are ill-suited for discrete diffusion language models due to their parallel generation nature. To bridge this gap, the authors propose Prism, the first efficient test-time scaling framework tailored for such models. Prism dynamically allocates computational resources through Hierarchical Trajectory Search (HTS), enhances output diversity via local branch re-masking, and replaces external verifiers with an endogenous Self-Verification Feedback (SVF) mechanism. Evaluated on four benchmarks spanning mathematical reasoning and code generation, Prism matches or approaches best-of-N performance while using significantly fewer function evaluations, demonstrating both its efficiency and broad applicability.

Technology Category

Application Category

📝 Abstract

Inference-time compute has re-emerged as a practical way to improve LLM reasoning. Most test-time scaling (TTS) algorithms rely on autoregressive decoding, which is ill-suited to discrete diffusion language models (dLLMs) due to their parallel decoding over the entire sequence. As a result, developing effective and efficient TTS methods to unlock dLLMs'full generative potential remains an underexplored challenge. To address this, we propose Prism (Pruning, Remasking, and Integrated Self-verification Method), an efficient TTS framework for dLLMs that (i) performs Hierarchical Trajectory Search (HTS) which dynamically prunes and reallocates compute in an early-to-mid denoising window, (ii) introduces Local branching with partial remasking to explore diverse implementations while preserving high-confidence tokens, and (iii) replaces external verifiers with Self-Verified Feedback (SVF) obtained via self-evaluation prompts on intermediate completions. Across four mathematical reasoning and code generation benchmarks on three dLLMs, including LLaDA 8B Instruct, Dream 7B Instruct, and LLaDA 2.0-mini, our Prism achieves a favorable performance-efficiency trade-off, matching best-of-N performance with substantially fewer function evaluations (NFE). The code is released at https://github.com/viiika/Prism.

Problem

Research questions and friction points this paper is trying to address.

test-time scaling

discrete diffusion language models

inference-time compute

parallel decoding

generative potential

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete Diffusion Language Models

Test-Time Scaling

Hierarchical Trajectory Search