LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant challenges posed by advanced obfuscation techniques in malware reverse engineering, which severely hinder decompilation, and the limited domain adaptability of existing large language models to malicious code. To bridge this gap, the paper proposes a domain-adaptive large language model framework tailored for reverse engineering. It introduces a novel multi-adapter mechanism to align syntactic and semantic structures between assembly and source code, and employs a unified Seq2Seq architecture with task-specific prefixes to enable bidirectional translation (assembly ↔ source code) under end-to-end constraints. Experimental results demonstrate that the proposed approach substantially outperforms current decompilation tools and general-purpose code models, exhibiting strong bidirectional generalization capabilities.
📝 Abstract
Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most existing approaches rely on generic code pretraining and lack adaptation to malicious software. We propose LLM4CodeRE, a domain-adaptive LLM framework for bidirectional code reverse engineering that supports both assembly-to-source decompilation and source-to-assembly translation within a unified model. To enable effective task adaptation, we introduce two complementary fine-tuning strategies: (i) a Multi-Adapter approach for task-specific syntactic and semantic alignment, and (ii) a Seq2Seq Unified approach using task-conditioned prefixes to enforce end-to-end generation constraints. Experimental results demonstrate that LLM4CodeRE outperforms existing decompilation tools and general-purpose code models, achieving robust bidirectional generalization.
Problem

Research questions and friction points this paper is trying to address.

code decompilation
reverse engineering
malware analysis
obfuscation
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-adaptive LLM
bidirectional decompilation
multi-adapter fine-tuning
task-conditioned prefix
malware reverse engineering
🔎 Similar Papers
No similar papers found.