Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Autoregressive language models (ARMs) suffer from high generation costs and low inference efficiency, while discrete diffusion language models (DDLMs), though parallelizable and capable of multi-step planning, exhibit limited single-step accuracy. To address this, we propose an ARM–DDLM collaborative inference framework featuring a “planning–execution” two-stage architecture: first, a DDLM performs parallel, multi-step reasoning in latent space to generate a high-level plan; second, a learnable projector maps the latent variables into the ARM’s decoding space for efficient token generation. This work pioneers cross-model latent-variable collaboration between text and latent spaces, substantially reducing computational overhead. Experiments on DART-5 and AIME24 show accuracy improvements from 27.0% to 54.0% and from 0.0% to 14.0%, respectively. Moreover, our method surpasses Qwen3.1-7B using only a few tokens—demonstrating the effectiveness and practicality of complementary modeling.

Technology Category

Application Category

📝 Abstract

Current autoregressive language models (ARMs) achieve high accuracy but require long token sequences, making them costly. Discrete diffusion language models (DDLMs) enable parallel and flexible generation within a fixed number of steps and have recently emerged for their strong performance in complex reasoning and long-term planning tasks. We present a study exploring hybrid architectures that couple DDLMs with ARMs to assess whether their collaboration can yield complementary benefits. We first examine collaboration in text space, where one model plans the reasoning process and another executes the final answer based on that plan. We then extend this setup to latent-space communication, introducing a learned projector that maps DDLM latents into the ARM's embedding space, potentially bypassing some of the text-generation limitations of diffusion models. We find that shifting DDLM --> ARM communication from text space to latent space yields significant accuracy gains, for example increasing from 27.0% to 54.0% on DART-5 and from 0.0% to 14.0% on AIME24. We also find that combining a DDLM planner with an ARM executor can provide substantial computational savings with little to no impact on accuracy. For example, the latent-space pipeline, using 64 tokens for planning and roughly 5 for execution, surpasses Qwen3.1-7B on DART-5 and AIME, despite Qwen using 44 times more tokens. Overall, our study offers new insights into reasoning with DDLMs and highlights their potential in hybrid architectures.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost of autoregressive models in reasoning tasks

Enhancing reasoning accuracy through latent-space model collaboration

Combining diffusion and autoregressive models for efficient planning-execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid architecture combines discrete diffusion and autoregressive models

Latent-space communication via learned projector enhances model accuracy

DDLM planner with ARM executor reduces computational token usage

🔎 Similar Papers

No similar papers found.

Authors to Follow