🤖 AI Summary
Existing approaches to scientific knowledge classification often suffer from semantic inconsistency and structural misalignment within hierarchical taxonomies, hindering their ability to effectively organize the rapidly expanding body of scholarly literature. This work proposes a hierarchical classification framework grounded in large language models, which integrates a bidirectional title generation mechanism—combining bottom-up abstraction with top-down constraints—to jointly optimize vertical alignment across levels and horizontal semantic coherence among sibling nodes. The method explicitly models semantic dependencies among nodes at the same hierarchy level, significantly enhancing the logical structure, semantic fidelity, and title quality of the resulting taxonomy. Extensive experiments demonstrate strong performance across multiple benchmark datasets and reveal robust cross-lingual generalization capabilities, particularly on Chinese scientific literature.
📝 Abstract
Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation, as well as enabling downstream applications such as trend analysis, idea generation, and information retrieval. However, existing taxonomy generation approaches often suffer from structural inconsistencies and semantic misalignment across hierarchical levels. Through empirical analysis, we find that these issues largely stem from inadequate modeling of hierarchical semantic consistency. To address this limitation, we propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models (LLMs) with hierarchy-aware refinement stages to ensure semantic consistency. Specifically, SC-Taxo introduces a bidirectional heading generation mechanism that jointly performs bottom-up abstraction and top-down semantic constraint, while further capturing peer-level semantic dependencies to enhance horizontal consistency. Experiments on multiple benchmark datasets demonstrate consistent improvements in hierarchy alignment and heading quality, and additional evaluation on Chinese scientific literature validates its robust cross-lingual generalization.