Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches for variable type recovery in binary code suffer from low accuracy due to real-world challenges—including cross-function type propagation, skewed type distributions, and interference from compiler optimizations. Method: We propose the first holistic framework that decouples (1) target type set construction, (2) cross-function data-flow tracing, and (3) joint modeling via gated graph neural networks (GGNNs). Crucially, it is the first to systematically characterize how compiler optimizations perturb type-relevant features. The method synergistically integrates precise static analysis with long-range dependency modeling. Contribution/Results: Our approach achieves significant improvements over state-of-the-art methods on the TYDA multi-architecture benchmark. In practical CTF scenarios, the decompiled pseudocode exhibits superior readability compared to outputs from IDA Pro and Ghidra, substantially accelerating reverse-engineering workflows.

Technology Category

Application Category

📝 Abstract
Type recovery is a crucial step in binary code analysis, holding significant importance for reverse engineering and various security applications. Existing works typically simply target type identifiers within binary code and achieve type recovery by analyzing variable characteristics within functions. However, we find that the types in real-world binary programs are more complex and often follow specific distribution patterns. In this paper, to gain a profound understanding of the variable type recovery problem in binary code, we first conduct a comprehensive empirical study. We utilize the TYDA dataset, which includes 163,643 binary programs across four architectures and four compiler optimization options, fully reflecting the complexity and diversity of real-world programs. We carefully study the unique patterns that characterize types and variables in binary code, and also investigate the impact of compiler optimizations on them, yielding many valuable insights. Based on our empirical findings, we propose ByteTR, a framework for recovering variable types in binary code. We decouple the target type set to address the issue of unbalanced type distribution and perform static program analysis to tackle the impact of compiler optimizations on variable storage. In light of the ubiquity of variable propagation across functions observed in our study, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery. We conduct extensive experiments to evaluate the performance of ByteTR. The results demonstrate that ByteTR leads state-of-the-art works in both effectiveness and efficiency. Moreover, in real CTF challenge case, the pseudo code optimized by ByteTR significantly improves readability, surpassing leading tools IDA and Ghidra.
Problem

Research questions and friction points this paper is trying to address.

Addresses variable type recovery in binary code analysis.
Explores impact of compiler optimizations on type distribution.
Proposes ByteTR for effective inter-procedural type recovery.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples target type set for unbalanced distribution
Uses inter-procedural analysis for variable propagation
Employs gated graph neural network for dependencies
🔎 Similar Papers
No similar papers found.
G
Gangyang Li
University of Science and Technology of China, Hefei, Anhui, China
Xiuwei Shang
Xiuwei Shang
University of Science and Technology of China
AI4SEAI4SecuritySE4AI
Shaoyin Cheng
Shaoyin Cheng
University of Science and Technology of China
J
Junqi Zhang
University of Science and Technology of China, Anhui Province Key Laboratory of Digital Security, Hefei, Anhui, China
L
Li Hu
University of Science and Technology of China, Hefei, Anhui, China
W
Weiming Zhang
University of Science and Technology of China, Anhui Province Key Laboratory of Digital Security, Hefei, Anhui, China
N
Neng H. Yu
University of Science and Technology of China, Anhui Province Key Laboratory of Digital Security, Hefei, Anhui, China