🤖 AI Summary
This work addresses the challenge of taxonomy quality assessment in the absence of gold-standard annotations. We propose two novel reference-free evaluation metrics: (1) a robustness measure based on the correlation between semantic similarity and hierarchical distance, designed to detect semantic-structural inconsistencies; and (2) a logical sufficiency assessment grounded in natural language inference (NLI), which verifies entailment relationships between parent and child concepts. To our knowledge, this is the first approach enabling fully unsupervised, quantitative taxonomy quality evaluation—bridging critical gaps in existing methods regarding semantic-structural alignment and logical consistency validation. Experiments across five real-world taxonomies demonstrate that both metrics exhibit strong correlation with gold-standard F1 scores (Spearman ρ > 0.85), significantly enhancing the reliability and interpretability of unsupervised taxonomy evaluation.
📝 Abstract
We introduce two reference-free metrics for quality evaluation of taxonomies. The first metric evaluates robustness by calculating the correlation between semantic and taxonomic similarity, covering a type of error not handled by existing metrics. The second uses Natural Language Inference to assess logical adequacy. Both metrics are tested on five taxonomies and are shown to correlate well with F1 against gold-standard taxonomies.