🤖 AI Summary
This work addresses the challenge of identifying structural and semantic similarities across imperative programs written in different languages by proposing a unified graph representation that integrates abstract syntax trees with neural semantic embeddings. The approach transforms annotated programs into typed, attributed graphs and leverages CodeBERT and SentenceTransformer to generate rich semantic embeddings. By constructing consistent graph representations across multilingual verification datasets—including C/ACSL, Java/JML, and Dafny—it achieves, for the first time, joint modeling of syntactic structure and formal semantics. This unified framework offers a viable pathway for cross-language reuse of verification artifacts and demonstrates strong generality and effectiveness across diverse programming languages and specification frameworks.
📝 Abstract
Reusing verification artefacts requires identifying structural and semantic similarities across programs and their specifications. In this paper, we focus on graph construction as a foundational step toward this goal. We present a pipeline that converts imperative programs and their annotations into typed, attributed graphs. Our experiments cover datasets including C with ACSL, Java with JML, and Dafny for C\#. The pipeline integrates abstract syntax tree parsing with semantic embeddings derived from models such as SentenceTransformer and CodeBERT. This enables the generation of graph representations that capture both structural relationships and semantic context. Our results show that consistent graph representations can be constructed across different languages and annotation styles. This work provides a practical basis for future steps in semantic enrichment and approximate graph matching for scalable verification artefact reuse.