đ¤ AI Summary
This work addresses three key challenges in word embedding research: weak interpretability of embedding models, the absence of a unified comparative framework across algorithms, and the difficulty of quantifying semantic bias. To this end, we propose a transparent semantic space framework grounded in category theory. Our method formalizes textual semantics via categorical structuresâspecifically, the Conf (configuration) and Emb (embedding) categoriesâintroduces divergence decoration, and defines dimension-agnostic semantic spaces, enabling mathematically rigorous modeling and visualization of semantic extraction. We establish, for the first time under strict mathematical conditions, the categorical equivalence of GloVe, Word2Vec, and multidimensional scaling (MDS) within a shared structural framework. Moreover, the framework supports computable characterization of pre-embedding bias and bias mitigation at the semantic level. By transforming black-box embeddings into verifiable, comparable, and controllable systems, our approach provides a formal foundation for interpretable AI and fair semantic modeling.
đ Abstract
The paper introduces a novel framework based on category theory to enhance the explainability of artificial intelligence systems, particularly focusing on word embeddings. Key topics include the construction of categories $ Ĺ_{T} $ and $ Âś_{T} $, providing schematic representations of the semantics of a text $ T $, and reframing the selection of the element with maximum probability as a categorical notion. Additionally, the monoidal category $ Âś_{T} $ is constructed to visualize various methods of extracting semantic information from $ T $, offering a dimension-agnostic definition of semantic spaces reliant solely on information within the text.
Furthermore, the paper defines the categories of configurations $ Conf $ and word embeddings $ Emb $, accompanied by the concept of divergence as a decoration on $ Emb $. It establishes a mathematically precise method for comparing word embeddings, demonstrating the equivalence between the GloVe and Word2Vec algorithms and the metric MDS algorithm, transitioning from neural network algorithms (black box) to a transparent framework. Finally, the paper presents a mathematical approach to computing biases before embedding and offers insights on mitigating biases at the semantic space level, advancing the field of explainable artificial intelligence.