Making Language Model a Hierarchical Classifier and Generator

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) lack human-like hierarchical reasoning capabilities, particularly in simultaneously capturing semantic abstractions at multiple granularities. Method: We propose Hierarchical Decoding—a novel paradigm for decoder-only pretrained models—where the language head is replicated and independently fine-tuned atop multiple intermediate transformer layers, enabling concurrent, multi-granularity semantic decoding. This unifies classification and generation within a single architecture, supporting hierarchical text classification, classification-guided generation, and hierarchical text generation. Contribution/Results: Our approach breaks the conventional single-output-layer constraint by endowing intermediate layers with trainable, interpretable semantic output heads for the first time. It further introduces a classification-guided generation mechanism to enforce inter-layer consistency. Evaluated across diverse benchmarks—including hierarchical classification, controlled generation, and reasoning tasks—our method achieves state-of-the-art performance, demonstrating that hierarchical decoding significantly enhances LLMs’ cognitive modeling capacity.

Technology Category

Application Category

📝 Abstract
Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.
Problem

Research questions and friction points this paper is trying to address.

Adapts pretrained language models into hierarchical decoders
Enables simultaneous text decoding at different layers
Improves performance on hierarchical classification and generation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decoder architecture with simultaneous layer decoding
Copying and fine-tuning language heads to intermediate layers
State-of-the-art performance in hierarchical text tasks
🔎 Similar Papers
No similar papers found.
Y
Yihong Wang
Geely AI Lab
Z
Zhonglin Jiang
Geely AI Lab
Ningyuan Xi
Ningyuan Xi
Beihang University
LLMNature Language ProcessingMachine Learning
Y
Yue Zhao
Geely AI Lab
Q
Qingqing Gu
Geely AI Lab
Xiyuan Chen
Xiyuan Chen
Geely AI Lab
H
Hao Wu
Tsinghua University
S
Sheng Xu
Geely AI Lab
H
Hange Zhou
Geely AI Lab
Y
Yong Chen
Geely AI Lab
Luo Ji
Luo Ji
Alibaba Group
Reinforcement LearningAutomatic Control