Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

In binary static analysis, inaccurate resolution of indirect call targets leads to incomplete interprocedural control-flow graph (CFG) construction. To address this, we propose a novel approach that synergistically integrates compiler-level type analysis with graph neural networks (GNNs). First, we enhance the CFG structure by incorporating cross-reference information and leverage type constraints to generate high-quality, semantically grounded training samples. Second, we employ relational graph convolution to jointly model semantic correlations among data flow, control flow, and code references, thereby learning highly discriminative binary representations. Evaluated on real-world binaries, our method achieves an F1 score of 95.2%, significantly outperforming state-of-the-art approaches. It enables precise and scalable interprocedural CFG reconstruction and facilitates deep program semantic understanding.

Technology Category

Application Category

📝 Abstract

Binary code analysis is essential in scenarios where source code is unavailable, with extensive applications across various security domains. However, accurately resolving indirect call targets remains a longstanding challenge in maintaining the integrity of static analysis in binary code. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, resulting in an incomplete inter-procedural control flow graph (CFG). Previous approaches have struggled with low accuracy and limited scalability. To address these limitations, recent work has increasingly turned to machine learning (ML) to enhance analysis. However, this ML-driven approach faces two significant obstacles: low-quality callsite-callee training pairs and inadequate binary code representation, both of which undermine the accuracy of ML models. In this paper, we introduce NeuCall, a novel approach for resolving indirect calls using graph neural networks. Existing ML models in this area often overlook key elements such as data and code cross-references, which are essential for understanding a program's control flow. In contrast, NeuCall augments CFGs with cross-references, preserving rich semantic information. Additionally, we leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs, enhancing model precision and reliability. We further design a graph neural model that leverages augmented CFGs and relational graph convolutions for accurate target prediction. Evaluated against real-world binaries from GitHub and the Arch User Repository on x86_64 architecture, NeuCall achieves an F1 score of 95.2%, outperforming state-of-the-art ML-based approaches. These results highlight NeuCall's effectiveness in building precise inter-procedural CFGs and its potential to advance downstream binary analysis and security applications.

Problem

Research questions and friction points this paper is trying to address.

Resolving indirect call targets in binary code accurately

Improving low-quality callsite-callee training pairs for ML models

Enhancing binary code representation with cross-references for better analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses graph neural networks for indirect calls

Augments CFGs with cross-references for semantics

Leverages compiler-level type analysis for training

🔎 Similar Papers

GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding