Padding Matters -- Exploring Function Detection in PE Files

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of function start detection in Windows PE binaries, systematically revealing for the first time how inter-function padding critically degrades the robustness of mainstream analysis tools. To this end, we introduce FuncPEval—the first large-scale, manually annotated dataset covering Chromium and Conti samples—and conduct a comprehensive evaluation of eight state-of-the-art tools. We further propose enhanced RNN- and XDA-based models, achieving an average F1-score improvement of ~10%. Experiments demonstrate that DeepDi achieves the best overall performance in accuracy (97% F1), inference speed, and padding robustness; IDA attains a peak F1 of 98.44% on Chromium x64 binaries. We also design a controlled padding perturbation framework that integrates heuristic disassemblers (IDA/Ghidra) with deep learning, enabling interpretable analysis. This study establishes a new benchmark and methodology for Windows binary analysis.

Technology Category

Application Category

📝 Abstract
Function detection is a well-known problem in binary analysis. While previous research has primarily focused on Linux/ELF, Windows/PE binaries have been overlooked or only partially considered. This paper introduces FuncPEval, a new dataset for Windows x86 and x64 PE files, featuring Chromium and the Conti ransomware, along with ground truth data for 1,092,820 function starts. Utilizing FuncPEval, we evaluate five heuristics-based (Ghidra, IDA, Nucleus, rev.ng, SMDA) and three machine-learning-based (DeepDi, RNN, XDA) function start detection tools. Among the tested tools, IDA achieves the highest F1-score (98.44%) for Chromium x64, while DeepDi closely follows (97%) but stands out as the fastest by a significant margin. Working towards explainability, we examine the impact of padding between functions on the detection results. Our analysis shows that all tested tools, except rev.ng, are susceptible to randomized padding. The randomized padding significantly diminishes the effectiveness for the RNN, XDA, and Nucleus. Among the learning-based tools, DeepDi exhibits the least sensitivity and demonstrates overall the fastest performance, while Nucleus is the most adversely affected among non-learning-based tools. In addition, we improve the recurrent neural network (RNN) proposed by Shin et al. and enhance the XDA tool, increasing the F1-score by approximately 10%.
Problem

Research questions and friction points this paper is trying to address.

Evaluating function detection tools for Windows PE files
Assessing impact of padding on function detection accuracy
Improving RNN and XDA tools for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

FuncPEval dataset for Windows PE files
Evaluates heuristics and machine-learning tools
Improves RNN and XDA tools performance
🔎 Similar Papers
2024-03-27ACM Transactions on Software Engineering and MethodologyCitations: 2
R
Raphael Springer
Westphalian University of Applied Sciences, Institute for Internet Security, Gelsenkirchen, Germany
Alexander Schmitz
Alexander Schmitz
Associate Professor at Waseda University
Artificial IntelligenceRobotics
A
Artur Leinweber
Westphalian University of Applied Sciences, Institute for Internet Security, Gelsenkirchen, Germany
Tobias Urban
Tobias Urban
Westphalian University of Applied Sciences, Institute for Internet Security, Gelsenkirchen, Germany
Christian Dietrich
Christian Dietrich
Westphalian University of Applied Sciences, Institute for Internet Security, Gelsenkirchen, Germany