How Small Transformation Expose the Weakness of Semantic Similarity Measures

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fundamental question of whether semantic similarity measures genuinely comprehend semantic relationships. We propose the first evaluation framework based on controlled, small-scale semantic transformations to systematically assess the semantic discrimination capability of 18 state-of-the-art methods—including bag-of-words, embedding-based, LLM-based, and structure-aware models—on software engineering texts and code. Experiments reveal that mainstream embedding methods exhibit up to 99.9% misclassification rates in semantic opposition scenarios, exposing their reliance on superficial surface patterns. Substituting cosine similarity for Euclidean distance improves performance by 24–66%. LLM-based methods demonstrate superior fine-grained semantic distinction. Critically, our framework uncovers a foundational limitation in existing measures: their failure to capture semantic essence. It establishes the first reproducible, scalable benchmark paradigm for trustworthy semantic computation in software engineering contexts.

Technology Category

Application Category

📝 Abstract
This research examines how well different methods measure semantic similarity, which is important for various software engineering applications such as code search, API recommendations, automated code reviews, and refactoring tools. While large language models are increasingly used for these similarity assessments, questions remain about whether they truly understand semantic relationships or merely recognize surface patterns. The study tested 18 different similarity measurement approaches, including word-based methods, embedding techniques, LLM-based systems, and structure-aware algorithms. The researchers created a systematic testing framework that applies controlled changes to text and code to evaluate how well each method handles different types of semantic relationships. The results revealed significant issues with commonly used metrics. Some embedding-based methods incorrectly identified semantic opposites as similar up to 99.9 percent of the time, while certain transformer-based approaches occasionally rated opposite meanings as more similar than synonymous ones. The study found that embedding methods' poor performance often stemmed from how they calculate distances; switching from Euclidean distance to cosine similarity improved results by 24 to 66 percent. LLM-based approaches performed better at distinguishing semantic differences, producing low similarity scores (0.00 to 0.29) for genuinely different meanings, compared to embedding methods that incorrectly assigned high scores (0.82 to 0.99) to dissimilar content.
Problem

Research questions and friction points this paper is trying to address.

Evaluating semantic similarity measures for software engineering tasks
Testing 18 methods including embeddings and LLMs on semantic understanding
Identifying flaws where methods confuse opposites and synonyms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic testing framework for semantic similarity
Switching Euclidean to cosine similarity improves performance
LLM-based approaches better distinguish semantic differences
🔎 Similar Papers
No similar papers found.
S
Serge Lionel Nikiema
University of Luxembourg, Luxembourg, Luxembourg
A
Albérick Euraste Djire
University of Luxembourg, Luxembourg, Luxembourg
A
Abdoul Aziz Bonkoungou
University of Luxembourg, Luxembourg, Luxembourg
M
Micheline Bénédicte Moumoula
University of Luxembourg, Luxembourg, Luxembourg
Jordan Samhi
Jordan Samhi
Research Scientist, University of Luxembourg
Computer ScienceSoftware EngineeringSoftware SecurityAndroid SecurityProgram Analysis
A
Abdoul Kader Kabore
University of Luxembourg, Luxembourg, Luxembourg
Jacques Klein
Jacques Klein
University of Luxembourg / SnT
Computer ScienceSoftware EngineeringAndroid SecuritySoftware SecurityModel-Driven Engineering
T
Tegawendé F. Bissyande
University of Luxembourg, Luxembourg, Luxembourg