🤖 AI Summary
This work addresses the lack of efficient decoding methods for controlling intra-batch diversity in discrete diffusion models for text generation. The authors propose D5P4, a parallel beam search framework that, for the first time, integrates Determinantal Point Processes (DPPs) into discrete diffusion decoding. By modularizing the beam selection objective, D5P4 explicitly balances generation probability and diversity with negligible computational overhead. The approach combines maximum a posteriori (MAP) inference for DPPs with a scalable greedy solver and supports multi-GPU parallel decoding. Experimental results demonstrate that D5P4 significantly enhances output diversity in both free-form text generation and question-answering tasks while maintaining generation quality on par with strong baselines.
📝 Abstract
Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard decoding methods for autoregressive models, such as beam search, do not directly apply to iterative denoising, and existing diffusion decoding techniques provide limited control over in-batch diversity. To bridge this gap, we introduce a generalized beam-search framework for discrete diffusion that generates candidates in parallel and supports modular beam-selection objectives. As a diversity-focused instantiation, we propose D5P4, which formulates the selection step as MAP inference over a Determinantal Point Process. Leveraging a scalable greedy solver, D5P4 maintains multi-GPU compatibility and enables an explicit trade-off between model probability and target diversity with near-zero compute overhead. Experiments on free-form generation and question answering demonstrate that D5P4 improves diversity over strong baselines while maintaining competitive generation quality.