FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation

๐Ÿ“… 2026-01-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diffusion language models are constrained by positional bias, limiting their ability to leverage the full potential of non-autoregressive generation with arbitrary token ordering. This work introduces frequency-domain analysis into this domain for the first time, revealing through Fourier transforms that low-frequency components in hidden states encode global structural information while high-frequency components capture local details. Building on this insight, the authors propose a frequency-guided โ€œstructure-to-detailโ€ generation paradigm, dynamically modulating spectral content during decoding via a sliding window mechanism. This approach transcends the limitations of conventional sequential generation, achieving relative performance improvements of 20.4% and 16.0% on the LLADA and SDAR benchmarks, respectively, and significantly outperforming the autoregressive model Llama3.1-8B-Instruct of comparable scale.

Technology Category

Application Category

๐Ÿ“ Abstract
Despite the non-autoregressive potential of diffusion language models (dLLMs), existing decoding strategies demonstrate positional bias, failing to fully unlock the potential of arbitrary generation. In this work, we delve into the inherent spectral characteristics of dLLMs and present the first frequency-domain analysis showing that low-frequency components in hidden states primarily encode global structural information and long-range dependencies, while high-frequency components are responsible for characterizing local details. Based on this observation, we propose FourierSampler, which leverages a frequency-domain sliding window mechanism to dynamically guide the model to achieve a"structure-to-detail"generation. FourierSampler outperforms other inference enhancement strategies on LLADA and SDAR, achieving relative improvements of 20.4% on LLaDA1.5-8B and 16.0% on LLaDA-8B-Instruct. It notably surpasses similarly sized autoregressive models like Llama3.1-8B-Instruct.
Problem

Research questions and friction points this paper is trying to address.

diffusion language models
non-autoregressive generation
positional bias
arbitrary generation
decoding strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models
frequency-domain analysis
non-autoregressive generation
FourierSampler
structure-to-detail generation
S
Siyang He
1Fudan University 2Shanghai Innovation Institute 3OpenMOSS Team
Q
Qiqi Wang
1Fudan University 2Shanghai Innovation Institute 3OpenMOSS Team
Xiaoran Liu
Xiaoran Liu
Fudan University
natural language processing
H
Hongnan Ma
3OpenMOSS Team
Y
Yiwei Shi
3OpenMOSS Team
Y
Yuerong Song
1Fudan University 2Shanghai Innovation Institute 3OpenMOSS Team
Ying Zhu
Ying Zhu
Fudan University
BioinformaticsNeuroscienceGenomicsTranscriptome
Tianyi Liang
Tianyi Liang
PHD, East China Normal University, Shanghai AI Lab,Shanghai Innovation Institute
Multimodal LearningLLMsImage Editing
Zengfeng Huang
Zengfeng Huang
Fudan University
AlgorithmsGraphsStreamingLearningTheory
Ziwei He
Ziwei He
Shanghai Jiao Tong University
Machine Learning
X
Xipeng Qiu
1Fudan University 2Shanghai Innovation Institute 3OpenMOSS Team