Making Language Model a Hierarchical Classifier and Generator

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) lack human-like hierarchical reasoning capabilities, particularly in simultaneously capturing semantic abstractions at multiple granularities. Method: We propose Hierarchical Decoding—a novel paradigm for decoder-only pretrained models—where the language head is replicated and independently fine-tuned atop multiple intermediate transformer layers, enabling concurrent, multi-granularity semantic decoding. This unifies classification and generation within a single architecture, supporting hierarchical text classification, classification-guided generation, and hierarchical text generation. Contribution/Results: Our approach breaks the conventional single-output-layer constraint by endowing intermediate layers with trainable, interpretable semantic output heads for the first time. It further introduces a classification-guided generation mechanism to enforce inter-layer consistency. Evaluated across diverse benchmarks—including hierarchical classification, controlled generation, and reasoning tasks—our method achieves state-of-the-art performance, demonstrating that hierarchical decoding significantly enhances LLMs’ cognitive modeling capacity.

Technology Category

Application Category

📝 Abstract

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.

Problem

Research questions and friction points this paper is trying to address.

Adapts pretrained language models into hierarchical decoders

Enables simultaneous text decoding at different layers

Improves performance on hierarchical classification and generation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decoder architecture with simultaneous layer decoding

Copying and fine-tuning language heads to intermediate layers

State-of-the-art performance in hierarchical text tasks

🔎 Similar Papers

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically