TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rapid evolution of scientific literature renders traditional taxonomy construction costly and poorly generalizable; existing automated approaches either sacrifice domain generality by relying on narrow corpora or over-rely on static LLM knowledge, neglecting field-specific dynamics and the intrinsic multidimensional attributes of papers (e.g., methods, tasks, metrics). To address these limitations, we propose TaxoAdapt—the first dynamic, multidimensional taxonomy framework tailored for scientific domains. It integrates LLM-driven iterative hierarchical modeling, multidimensional topic distribution–guided taxonomy expansion, and a novel multidimensional dynamic alignment mechanism to jointly preserve fine-grained categorization and semantic coherence. Evaluated on proceedings from multiple top-tier computer science conferences, TaxoAdapt achieves state-of-the-art performance: it improves granularity retention by 26.51% and semantic coherence by 50.41% over the strongest baseline, as assessed by LLM-based evaluation.

Technology Category

Application Category

📝 Abstract
The rapid evolution of scientific fields introduces challenges in organizing and retrieving scientific literature. While expert-curated taxonomies have traditionally addressed this need, the process is time-consuming and expensive. Furthermore, recent automatic taxonomy construction methods either (1) over-rely on a specific corpus, sacrificing generalizability, or (2) depend heavily on the general knowledge of large language models (LLMs) contained within their pre-training datasets, often overlooking the dynamic nature of evolving scientific domains. Additionally, these approaches fail to account for the multi-faceted nature of scientific literature, where a single research paper may contribute to multiple dimensions (e.g., methodology, new tasks, evaluation metrics, benchmarks). To address these gaps, we propose TaxoAdapt, a framework that dynamically adapts an LLM-generated taxonomy to a given corpus across multiple dimensions. TaxoAdapt performs iterative hierarchical classification, expanding both the taxonomy width and depth based on corpus' topical distribution. We demonstrate its state-of-the-art performance across a diverse set of computer science conferences over the years to showcase its ability to structure and capture the evolution of scientific fields. As a multidimensional method, TaxoAdapt generates taxonomies that are 26.51% more granularity-preserving and 50.41% more coherent than the most competitive baselines judged by LLMs.
Problem

Research questions and friction points this paper is trying to address.

Dynamic adaptation of LLM-generated taxonomies to evolving scientific corpora
Addressing multi-faceted nature of scientific literature in taxonomy construction
Improving granularity and coherence in automatic taxonomy generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic LLM-based taxonomy adaptation
Iterative hierarchical classification expansion
Multidimensional granularity-preserving taxonomies
🔎 Similar Papers
No similar papers found.