MAGNET: A Multi-Graph Attentional Network for Code Clone Detection

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing code clone detection methods typically rely on a single program representation—e.g., AST, CFG, or DFG—failing to comprehensively capture program semantics; hybrid approaches are limited by handcrafted fusion strategies and yield marginal improvements. This paper proposes a novel multi-graph attention framework that unifies AST, CFG, and DFG for fine-grained semantic representation. Its core innovations include: (1) a gated cross-attention mechanism enabling dynamic inter-graph interaction; (2) residual graph neural networks combined with node-level self-attention to jointly model local and long-range dependencies; and (3) Set2Set pooling to generate robust program-level embeddings. Evaluated on BigCloneBench and Google Code Jam, our method achieves F1 scores of 96.5% and 99.2%, respectively—substantially surpassing state-of-the-art baselines. Ablation studies confirm the effectiveness and synergistic contributions of each component.

Technology Category

Application Category

📝 Abstract

Code clone detection is a fundamental task in software engineering that underpins refactoring, debugging, plagiarism detection, and vulnerability analysis. Existing methods often rely on singular representations such as abstract syntax trees (ASTs), control flow graphs (CFGs), and data flow graphs (DFGs), which capture only partial aspects of code semantics. Hybrid approaches have emerged, but their fusion strategies are typically handcrafted and ineffective. In this study, we propose MAGNET, a multi-graph attentional framework that jointly leverages AST, CFG, and DFG representations to capture syntactic and semantic features of source code. MAGNET integrates residual graph neural networks with node-level self-attention to learn both local and long-range dependencies, introduces a gated cross-attention mechanism for fine-grained inter-graph interactions, and employs Set2Set pooling to fuse multi-graph embeddings into unified program-level representations. Extensive experiments on BigCloneBench and Google Code Jam demonstrate that MAGNET achieves state-of-the-art performance with an overall F1 score of 96.5% and 99.2% on the two datasets, respectively. Ablation studies confirm the critical contributions of multi-graph fusion and each attentional component. Our code is available at https://github.com/ZixianReid/Multigraph_match

Problem

Research questions and friction points this paper is trying to address.

Detecting code clones using multi-graph representations

Improving fusion of syntactic and semantic code features

Overcoming limitations of singular code representation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-graph attentional network integrates AST, CFG, and DFG

Residual graph neural networks with node-level self-attention

Gated cross-attention mechanism for inter-graph interactions

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Software Engineer