🤖 AI Summary
Similarity scores derived from text embeddings lack interpretability, hindering their trustworthy deployment in transparency-critical NLP applications such as search.
Method: This paper establishes the first structured methodology framework for explaining text similarity, proposing a unified taxonomy and a multidimensional evaluation framework—encompassing faithfulness, readability, and efficiency. We systematically survey and empirically evaluate five mainstream explanation paradigms: attention attribution, perturbation-based analysis, prototype learning, token-level importance mapping, and generative explanation.
Contribution/Results: Our analysis reveals inherent trade-offs between explanation quality and computational overhead across methods and clarifies their respective applicability boundaries. The work provides a foundational theoretical paradigm for explainable text embedding research and delivers empirically grounded guidelines for industrial practitioners to select and customize explanation solutions according to task-specific requirements.
📝 Abstract
Text embeddings and text embedding models are a backbone of many AI and NLP systems, particularly those involving search. However, interpretability challenges persist, especially in explaining obtained similarity scores, which is crucial for applications requiring transparency. In this paper, we give a structured overview of interpretability methods specializing in explaining those similarity scores, an emerging research area. We study the methods' individual ideas and techniques, evaluating their potential for improving interpretability of text embeddings and explaining predicted similarities.