CSSG: Measuring Code Similarity with Semantic Graphs

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing code similarity metrics are often confined to string-level or syntactic representations, failing to capture deep semantic relationships. This work proposes CSSG, a novel approach that, for the first time, incorporates program dependence graphs into code similarity modeling. By explicitly encoding control dependencies and variable interactions, CSSG constructs a semantics-aware representation of code. The method integrates control-flow analysis with graph representation learning and demonstrates significant improvements over state-of-the-art metrics on the CodeContests+ dataset. It achieves superior performance in discerning semantic similarity both within a single programming language and across different languages, offering a more accurate and robust measure of functional equivalence.

Technology Category

Application Category

📝 Abstract
Existing code similarity metrics, such as BLEU, CodeBLEU, and TSED, largely rely on surface-level string overlap or abstract syntax tree structures, and often fail to capture deeper semantic relationships between programs.We propose CSSG (Code Similarity using Semantic Graphs), a novel metric that leverages program dependence graphs to explicitly model control dependencies and variable interactions, providing a semantics-aware representation of code.Experiments on the CodeContests+ dataset show that CSSG consistently outperforms existing metrics in distinguishing more similar code from less similar code under both monolingual and cross-lingual settings, demonstrating that dependency-aware graph representations offer a more effective alternative to surface-level or syntax-based similarity measures.
Problem

Research questions and friction points this paper is trying to address.

code similarity
semantic relationships
program dependence graphs
syntax-based metrics
surface-level overlap
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic graph
program dependence graph
code similarity
control dependency
cross-lingual code analysis
🔎 Similar Papers
No similar papers found.
J
Jingwen Xu
College of Computer Science and Artificial Intelligence, Fudan University
Y
Yiyang Lu
College of Computer Science and Artificial Intelligence, Fudan University
C
Changze Lv
College of Computer Science and Artificial Intelligence, Fudan University
Z
Zisu Huang
College of Computer Science and Artificial Intelligence, Fudan University
Z
Zhengkang Guo
College of Computer Science and Artificial Intelligence, Fudan University
Z
Zhengyuan Wang
College of Computer Science and Artificial Intelligence, Fudan University
M
Muzhao Tian
College of Computer Science and Artificial Intelligence, Fudan University
X
Xuanjing Huang
College of Computer Science and Artificial Intelligence, Fudan University
Xiaoqing Zheng
Xiaoqing Zheng
Fudan University
Natural Language Processing and Machine Learning