DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the longstanding trade-off between efficiency and quality in long-text generation, this paper proposes DrDiff, a dynamic routing diffusion framework. DrDiff integrates three core innovations: (1) a dynamic expert scheduling mechanism for adaptive computational resource allocation; (2) hierarchical sparse attention (HSA), which preserves global modeling capability while reducing time complexity to *O(n)*; and (3) a soft absorption guidance strategy, synergistically accelerating diffusion sampling with DPM-solver++. To our knowledge, DrDiff is the first diffusion-based method for long-text generation achieving both linear-time inference and high-quality outputs. Experiments across multiple long-text generation benchmarks demonstrate that DrDiff significantly outperforms existing state-of-the-art approaches—reducing diffusion steps by 60% and computational cost by 55%, accelerating generation by 2.3×, while simultaneously improving coherence, factual consistency, and lexical diversity.

Technology Category

Application Category

📝 Abstract
This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates computational resources during the diffusion process based on text complexity, enabling more efficient handling of text generation tasks of varying difficulty. Second, we introduce a Hierarchical Sparse Attention (HSA) mechanism that adaptively adjusts attention patterns according to a variety of input lengths, reducing computational complexity from O($n^2$) to O($n$) while maintaining model performance. Finally, we propose a soft absorption guidance optimization strategy that combines with DPM-solver++ to reduce diffusion steps, significantly improving generation speed. Comprehensive experiments on various long-text generation benchmarks demonstrate the superiority of our DrDiff over the existing SOTA methods.
Problem

Research questions and friction points this paper is trying to address.

Overcoming efficiency-quality trade-off in long-text generation
Reducing computational complexity from O(n^2) to O(n)
Improving generation speed while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic expert scheduling for resource allocation
Hierarchical Sparse Attention reduces complexity
Soft absorption guidance optimizes diffusion steps
🔎 Similar Papers
No similar papers found.
J
Jusheng Zhang
Sun Yat-sen University
Y
Yijia Fan
Sun Yat-sen University
K
Kaitong Cai
Sun Yat-sen University
Z
Zimeng Huang
Sun Yat-sen University
Xiaofei Sun
Xiaofei Sun
Stony Brook University, Zhejiang University
Social and Information NetworkNatural Language ProcessingMachine Learning
J
Jian Wang
Snap Inc
C
Chengpei Tang
Sun Yat-sen University
K
Keze Wang
Sun Yat-sen University