Hierarchical Text Classification with LLM-Refined Taxonomies

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenge of semantic ambiguity in human-constructed label taxonomies for hierarchical text classification—such as leaf nodes with identical names—which often hinders model performance. To mitigate this limitation, the paper proposes the first end-to-end framework that leverages large language models to reconstruct the entire label hierarchy through operations including renaming, merging, splitting, and structural reorganization. The resulting taxonomy is better aligned with the inductive biases of classification models, achieving semantic coherence at the taxonomy level. Evaluated on three standard benchmarks, the approach significantly outperforms the original human-designed hierarchies, yielding up to a 2.9 percentage point improvement in F1 score. These results underscore the critical impact of optimizing label taxonomies on hierarchical classification performance.

Technology Category

Application Category

📝 Abstract

Hierarchical text classification (HTC) depends on taxonomies that organize labels into structured hierarchies. However, many real-world taxonomies introduce ambiguities, such as identical leaf names under similar parent nodes, which prevent language models (LMs) from learning clear decision boundaries. In this paper, we present TaxMorph, a framework that uses large language models (LLMs) to transform entire taxonomies through operations such as renaming, merging, splitting, and reordering. Unlike prior work, our method revises the full hierarchy to better match the semantics encoded by LMs. Experiments across three HTC benchmarks show that LLM-refined taxonomies consistently outperform human-curated ones in various settings up to +2.9pp. in F1. To better understand these improvements, we compare how well LMs can assign leaf nodes to parent nodes and vice versa across human-curated and LLM-refined taxonomies. We find that human-curated taxonomies lead to more easily separable clusters in embedding space. However, the LLM-refined taxonomies align more closely with the model's actual confusion patterns during classification. In other words, even though they are harder to separate, they better reflect the model's inductive biases. These findings suggest that LLM-guided refinement creates taxonomies that are more compatible with how models learn, improving HTC performance.

Problem

Research questions and friction points this paper is trying to address.

Hierarchical Text Classification

Taxonomy Ambiguity

Label Hierarchies

Language Models

Decision Boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Text Classification

Large Language Models

Taxonomy Refinement