🤖 AI Summary
This study addresses a significant discrepancy between existing text embedding models and domain experts’ judgments of semantic similarity, which undermines the validity of downstream analyses. To bridge this gap, the authors propose a “stakeholder anchoring exercise” that explicitly elicits expert assessments of semantic relatedness to systematically evaluate and calibrate the alignment between embedding models and human expertise. The approach is reproducible, cross-lingual, and domain-agnostic, integrating neural embeddings, expert surveys, and cluster analysis. Applied to Danish policy documents and U.S. federal AI initiatives, it reveals alignment gaps of 19–26% and 16%, respectively, and demonstrates that these discrepancies substantially degrade clustering quality (Spearman ρ = 0.9).
📝 Abstract
Text embeddings are widely used to analyse large corpora of complex texts. However, it is unclear whether the embeddings capture the same semantic distances as the human experts using them. Ensuring alignment between embedding representations and human intentions is essential for valid analyses. We present the Stakeholder Grounding Exercise, a method for making expert associations explicit and grounding embedding model results in human understanding. In our primary case study on Danish policy issues, we find that neural text embeddings are substantially less reliable than human experts (19-26 pp gap), and that this misalignment propagates to downstream clustering performance (Spearman $ρ=0.9$ between exercise ranking and cluster quality). A secondary study on US Federal AI use cases replicates the gap (16pp) in English, using a digital protocol and a different community of experts -- demonstrating that the gap is not an artefact of a single instrument or domain. The Stakeholder Grounding Exercise offers a practical method for assessing whether embedding models capture the semantic distinctions that matter most to domain experts.