Prefix-Adaptive Block Diffusion for Efficient Document Recognition

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
Existing block diffusion models suffer from limited parallelism and cache latency due to fixed block boundaries, and their inconsistent information flow—bidirectional denoising within blocks versus autoregressive propagation across blocks—degrades performance on structure-sensitive document recognition tasks. To address these issues, this work proposes the Prefix-Adaptive Block Diffusion Model (PA-BDM), which reformulates intra-block denoising as a causal process from prefix to suffix and treats block size as a dynamic candidate range rather than a fixed unit. PA-BDM integrates a confidence-gated structural loss (CSL) and a progressive prefix commitment (PPC) mechanism to enable efficient parallel decoding while preserving recognition accuracy. With 3B parameters, PA-BDM outperforms current methods across multiple document recognition benchmarks and achieves a 71.6% higher inference throughput compared to the 2.5B-parameter MinerU-Diffusion model.
📝 Abstract
Block Diffusion Models (BDMs) support parallel generation, flexible-length output, and KV caching, making them promising for efficient document parsing. However, existing BDMs bind denoising and cache commitment to fixed block boundaries: parallelism shrinks during intra-block denoising, while generated tokens cannot be cached until the whole block is completed. Moreover, intra-block bidirectional denoising conflicts with inter-block autoregression, creating inconsistent information flow that can challenge structure-sensitive recognition. We propose the Prefix-Adaptive Block Diffusion Model (PA-BDM), which replaces intra-block bidirectional denoising with causal denoising from prefix to suffix and treats the block size as a maximum candidate range rather than a fixed commitment unit. PA-BDM uses Confidence-gated Structural Loss (CSL) to build low-entropy prefixes before extending training to longer continuations. During inference, Progressive Prefix Commitment (PPC) then dynamically commits the longest reliable prefix into the KV cache and resets the next candidate range from the updated prefix, restoring a large parallel decoding space at each step. Experiments show that the 3B PA-BDM achieves higher recognition scores on several benchmarks and improves inference throughput by 71.6\% over the 2.5B MinerU-Diffusion.
Problem

Research questions and friction points this paper is trying to address.

Block Diffusion Models
document recognition
parallel generation
KV caching
information flow inconsistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefix-Adaptive Block Diffusion
Causal Denoising
Progressive Prefix Commitment
Confidence-gated Structural Loss
KV Caching
🔎 Similar Papers
2023-09-20IEEE transactions on circuits and systems for video technology (Print)Citations: 0
Mingxu Chai
Mingxu Chai
Fudan University
Z
Ziyu Shen
Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China
C
Chenyu Liu
Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China
Kaidi Zhang
Kaidi Zhang
Purdue University
roboticstactile sensing
Jiazheng Zhang
Jiazheng Zhang
Fudan University
Large Language ModelNatural Language ProcessingData Mining
D
Dingwei Zhu
Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China
Zhiheng Xi
Zhiheng Xi
Fudan University
LLM ReasoningLLM-based Agents
Ruoyu Chen
Ruoyu Chen
Institute of Information Engineering, Chinese Academy of Sciences.
Explainable AITrustworthy AIFoundation Model
J
Jun Long
ByteDance, Shanghai, China
J
Jihua Kang
ByteDance, Shanghai, China
T
Tao Gui
Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China; Shanghai Innovation Institute, Shanghai, China
Qi Zhang
Qi Zhang
Fudan University
SAGINsatellite routing