🤖 AI Summary
To address inefficiency, unfairness, and inconsistency in large-scale taxonomy quality assessment, this paper proposes a hierarchical, progressive evaluation framework powered by large language models (LLMs). Methodologically, it employs top-down structural decomposition, cross-level consistency verification, standardized prompt engineering, an anomaly penalty mechanism, and integration of task-aligned quantitative metrics with semantic consistency modeling. Our key contributions include: (i) the first hierarchical evaluation paradigm that jointly captures global structural integrity and local semantic coherence; and (ii) an efficient, scalable, and interpretable end-to-end assessment pipeline. Experiments on diverse, complex taxonomies demonstrate significant improvements in detecting semantic errors, logical contradictions, and structural flaws—outperforming mainstream baselines in reliability—and generating actionable, fine-grained optimization recommendations.
📝 Abstract
This paper presents LITE, an LLM-based evaluation method designed for efficient and flexible assessment of taxonomy quality. To address challenges in large-scale taxonomy evaluation, such as efficiency, fairness, and consistency, LITE adopts a top-down hierarchical evaluation strategy, breaking down the taxonomy into manageable substructures and ensuring result reliability through cross-validation and standardized input formats. LITE also introduces a penalty mechanism to handle extreme cases and provides both quantitative performance analysis and qualitative insights by integrating evaluation metrics closely aligned with task objectives. Experimental results show that LITE demonstrates high reliability in complex evaluation tasks, effectively identifying semantic errors, logical contradictions, and structural flaws in taxonomies, while offering directions for improvement. Code is available at https://github.com/Zhang-l-i-n/TAXONOMY_DETECT .