UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limited reasoning capabilities of masked diffusion language models in complex tasks such as code generation and mathematical reasoning. To overcome this limitation, the authors propose a novel test-time scaling approach that reformulates the unmasking process as a search tree, where deterministic partial unmasking actions construct multiple branching paths. For the first time, Monte Carlo Tree Search (MCTS) is integrated to optimize the generation trajectory. This method departs from conventional paradigms reliant on random sampling and demonstrates significant performance gains over existing test-time scaling techniques across multiple challenging code generation benchmarks. Furthermore, it exhibits strong scalability and consistent improvements in mathematical reasoning tasks.

Technology Category

Application Category

📝 Abstract

Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process. To leverage this, we propose UnMaskFork (UMF), a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path. In contrast to standard scaling methods relying on stochastic sampling, UMF explores the search space through deterministic partial unmasking actions performed by multiple MDLMs. Our empirical evaluation demonstrates that UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks, while also exhibiting strong scalability on mathematical reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Masked Diffusion Language Models

test-time scaling

reasoning

search strategies

non-autoregressive generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked Diffusion Language Models

Test-Time Scaling

Monte Carlo Tree Search