🤖 AI Summary
Existing approaches to taxonomy expansion struggle to effectively model asymmetric hierarchical relations such as “is-a” and inadequately capture semantic uncertainty and polysemy. To address these limitations, this work proposes TaxoBell, a novel framework that integrates box embeddings with multivariate Gaussian distributions. In TaxoBell, the mean of each distribution represents the semantic position of a concept, while the covariance matrix encodes its semantic uncertainty, enabling a unified modeling of inclusion, disjointness, and fuzzy relationships. This formulation enhances hierarchical reasoning capabilities. Under a self-supervised setting, TaxoBell jointly optimizes an energy-based objective and explicitly models polysemy, achieving significant performance gains over eight state-of-the-art methods across five benchmark datasets—yielding a 19% improvement in MRR and approximately 25% higher Recall@k.
📝 Abstract
Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce catalogs, semantic search, and biomedical discovery. Yet, manual taxonomy expansion is labor-intensive and cannot keep pace with the emergence of new concepts. Existing automated methods rely on point-based vector embeddings, which model symmetric similarity and thus struggle with the asymmetric"is-a"relationships that are fundamental to taxonomies. Box embeddings offer a promising alternative by enabling containment and disjointness, but they face key issues: (i) unstable gradients at the intersection boundaries, (ii) no notion of semantic uncertainty, and (iii) limited capacity to represent polysemy or ambiguity. We address these shortcomings with TaxoBell, a Gaussian box embedding framework that translates between box geometries and multivariate Gaussian distributions, where means encode semantic location and covariances encode uncertainty. Energy-based optimization yields stable optimization, robust modeling of ambiguous concepts, and interpretable hierarchical reasoning. Extensive experimentation on five benchmark datasets demonstrates that TaxoBell significantly outperforms eight state-of-the-art taxonomy expansion baselines by 19% in MRR and around 25% in Recall@k. We further demonstrate the advantages and pitfalls of TaxoBell with error analysis and ablation studies.