TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to taxonomy expansion struggle to effectively model asymmetric hierarchical relations such as “is-a” and inadequately capture semantic uncertainty and polysemy. To address these limitations, this work proposes TaxoBell, a novel framework that integrates box embeddings with multivariate Gaussian distributions. In TaxoBell, the mean of each distribution represents the semantic position of a concept, while the covariance matrix encodes its semantic uncertainty, enabling a unified modeling of inclusion, disjointness, and fuzzy relationships. This formulation enhances hierarchical reasoning capabilities. Under a self-supervised setting, TaxoBell jointly optimizes an energy-based objective and explicitly models polysemy, achieving significant performance gains over eight state-of-the-art methods across five benchmark datasets—yielding a 19% improvement in MRR and approximately 25% higher Recall@k.

Technology Category

Application Category

📝 Abstract
Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce catalogs, semantic search, and biomedical discovery. Yet, manual taxonomy expansion is labor-intensive and cannot keep pace with the emergence of new concepts. Existing automated methods rely on point-based vector embeddings, which model symmetric similarity and thus struggle with the asymmetric"is-a"relationships that are fundamental to taxonomies. Box embeddings offer a promising alternative by enabling containment and disjointness, but they face key issues: (i) unstable gradients at the intersection boundaries, (ii) no notion of semantic uncertainty, and (iii) limited capacity to represent polysemy or ambiguity. We address these shortcomings with TaxoBell, a Gaussian box embedding framework that translates between box geometries and multivariate Gaussian distributions, where means encode semantic location and covariances encode uncertainty. Energy-based optimization yields stable optimization, robust modeling of ambiguous concepts, and interpretable hierarchical reasoning. Extensive experimentation on five benchmark datasets demonstrates that TaxoBell significantly outperforms eight state-of-the-art taxonomy expansion baselines by 19% in MRR and around 25% in Recall@k. We further demonstrate the advantages and pitfalls of TaxoBell with error analysis and ablation studies.
Problem

Research questions and friction points this paper is trying to address.

taxonomy expansion
asymmetric relationships
box embeddings
semantic uncertainty
polysemy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian box embeddings
taxonomy expansion
semantic uncertainty
energy-based optimization
self-supervised learning