Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional language models struggle to capture multi-granularity semantic structures. To address this, we propose the Hierarchical Diffusion Language Model (HDLM), the first language model to incorporate a hierarchical discrete diffusion framework. HDLM constructs a layered vocabulary, enabling the forward process to progressively abstract semantics and the reverse process to refine generation stepwise across scales. A scheduler modulates the semantic abstraction rate, supporting time-varying prediction of the next granularity level; moreover, we derive the evidence lower bound (ELBO) in closed form to ensure efficient training. The framework unifies several prominent language models as special cases. Experiments demonstrate that HDLM significantly reduces both validation and generation perplexity on text generation tasks, outperforming strong baseline models.

Technology Category

Application Category

📝 Abstract
In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techniques based on the insights. Extensive text generation experiments validate the effectiveness of HDLM, which demonstrates consistently lower validation and generative perplexity than baselines.
Problem

Research questions and friction points this paper is trying to address.

Predicting next semantic scale using hierarchical diffusion models
Modeling language through time-varying semantic abstraction process
Improving text generation via hierarchical vocabulary diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical diffusion models with semantic vocabulary mapping
Progressive next semantic scale prediction process
Flexible implementation with closed-form ELBO derivation
🔎 Similar Papers
No similar papers found.
Cai Zhou
Cai Zhou
Massachusetts Institute of Technology
Machine LearningGenerative ModelsLarge Language ModelsGraph Neural NetworksAI4Science
C
Chenyu Wang
Massachusetts Institute of Technology
D
Dinghuai Zhang
Microsoft Research, Mila - Quebec AI Institute
S
Shangyuan Tong
Massachusetts Institute of Technology
Y
Yifei Wang
Massachusetts Institute of Technology
Stephen Bates
Stephen Bates
Assistant Professor, MIT EECS
StatisticsMachine LearningArtificial IntelligenceUncertainty Quantification
Tommi Jaakkola
Tommi Jaakkola
MIT
machine learningnatural language processingbiomolecular design