Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of coherence and fine-grained structure in classification schemes amid the explosive growth of scientific literature, this paper proposes an LLM-guided, context-aware hierarchical classification framework. Methodologically, it introduces a multi-dimensional paper representation—encompassing methods, datasets, evaluation protocols, and other key aspects—and leverages large language models for dimension-specific semantic parsing and abstractive summarization. These representations are then integrated via dynamic hierarchical clustering to enable cross-dimensional joint modeling. Key contributions include: (1) the first benchmark comprising 156 manually curated classification schemas covering 11.6K papers; and (2) state-of-the-art performance across three core metrics—coherence, granularity, and interpretability—outperforming all existing approaches. The framework thus advances automated, principled taxonomy construction for large-scale scientific corpora.

Technology Category

Application Category

📝 Abstract
The rapid growth of scientific literature demands efficient methods to organize and synthesize research findings. Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models (LLMs), often lack coherence and granularity. We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering. Our method leverages LLMs to identify key aspects of each paper (e.g., methodology, dataset, evaluation) and generates aspect-specific paper summaries, which are then encoded and clustered along each aspect to form a coherent hierarchy. In addition, we introduce a new evaluation benchmark of 156 expert-crafted taxonomies encompassing 11.6k papers, providing the first naturally annotated dataset for this task. Experimental results demonstrate that our method significantly outperforms prior approaches, achieving state-of-the-art performance in taxonomy coherence, granularity, and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Organizing rapidly growing scientific literature efficiently
Improving coherence and granularity in taxonomy construction
Generating context-aware hierarchical taxonomies for papers
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided multi-aspect encoding
Dynamic clustering for hierarchy formation
Context-aware aspect-specific paper summarization
🔎 Similar Papers
No similar papers found.