SoK: Potentials and Challenges of Large Language Models for Reverse Engineering

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reverse engineering (RE) automation remains hindered by excessive human reliance and inconsistent evaluation—particularly in large language model (LLM) applications, where methodological heterogeneity and the absence of standardized benchmarks impede comparability and reproducibility. To address this, we systematically survey 44 papers and 18 open-source projects, proposing the first multidimensional taxonomy for LLMs in RE, structured along task objectives, technical approaches, and evaluation paradigms. Our analysis reveals LLMs’ distinctive strengths—including generative reasoning and cross-layer semantic understanding—while delineating their capability boundaries relative to traditional deep learning models. We identify three persistent challenges: fragmented evaluation criteria, limited-scale training data, and insufficient safety alignment. Based on these insights, we propose a forward-looking research framework emphasizing reproducibility, rigorous comparability, and safety-aware trustworthiness—laying foundational principles for principled LLM-driven RE advancement.

Technology Category

Application Category

📝 Abstract
Reverse Engineering (RE) is central to software security, enabling tasks such as vulnerability discovery and malware analysis, but it remains labor-intensive and requires substantial expertise. Earlier advances in deep learning start to automate parts of RE, particularly for malware detection and vulnerability classification. More recently, a rapidly growing body of work has applied Large Language Models (LLMs) to similar purposes. Their role compared to prior machine learning remains unclear, since some efforts simply adapt existing pipelines with minimal change while others seek to exploit broader reasoning and generative abilities. These differences, combined with varied problem definitions, methods, and evaluation practices, limit comparability, reproducibility, and cumulative progress. This paper systematizes the field by reviewing 44 research papers, including peer-reviewed publications and preprints, and 18 additional open-source projects that apply LLMs in RE. We propose a taxonomy that organizes existing work by objective, target, method, evaluation strategy, and data scale. Our analysis identifies strengths and limitations, highlights reproducibility and evaluation gaps, and examines emerging risks. We conclude with open challenges and future research directions that aim to guide more coherent and security-relevant applications of LLMs in RE.
Problem

Research questions and friction points this paper is trying to address.

Systematizing LLM applications in reverse engineering through taxonomy
Identifying reproducibility gaps in LLM-based reverse engineering methods
Addressing evaluation inconsistencies across LLM reverse engineering approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematizes LLM applications in reverse engineering
Proposes taxonomy organizing objectives methods evaluations
Identifies reproducibility gaps and emerging risks
🔎 Similar Papers
No similar papers found.
X
Xinyu Hu
School of Information Studies, McGill University, Montreal, Quebec, Canada
Z
Zhiwei Fu
School of Information Studies, McGill University, Montreal, Quebec, Canada
S
Shaocong Xie
School of Information Studies, McGill University, Montreal, Quebec, Canada
Steven H. H. Ding
Steven H. H. Ding
Data Mining and Security Lab, McGill University
Data MiningMachine LearningReverse EngineeringCybersecurity
P
Philippe Charland
Mission Critical Cyber Security Section, Defence R&D Canada – Valcartier, Quebec, QC, Canada