LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the significant challenges posed by advanced obfuscation techniques in malware reverse engineering, which severely hinder decompilation, and the limited domain adaptability of existing large language models to malicious code. To bridge this gap, the paper proposes a domain-adaptive large language model framework tailored for reverse engineering. It introduces a novel multi-adapter mechanism to align syntactic and semantic structures between assembly and source code, and employs a unified Seq2Seq architecture with task-specific prefixes to enable bidirectional translation (assembly ↔ source code) under end-to-end constraints. Experimental results demonstrate that the proposed approach substantially outperforms current decompilation tools and general-purpose code models, exhibiting strong bidirectional generalization capabilities.

Technology Category

Application Category

📝 Abstract

Code decompilation analysis is a fundamental yet challenging task in malware reverse engineering, particularly due to the pervasive use of sophisticated obfuscation techniques. Although recent large language models (LLMs) have shown promise in translating low-level representations into high-level source code, most existing approaches rely on generic code pretraining and lack adaptation to malicious software. We propose LLM4CodeRE, a domain-adaptive LLM framework for bidirectional code reverse engineering that supports both assembly-to-source decompilation and source-to-assembly translation within a unified model. To enable effective task adaptation, we introduce two complementary fine-tuning strategies: (i) a Multi-Adapter approach for task-specific syntactic and semantic alignment, and (ii) a Seq2Seq Unified approach using task-conditioned prefixes to enforce end-to-end generation constraints. Experimental results demonstrate that LLM4CodeRE outperforms existing decompilation tools and general-purpose code models, achieving robust bidirectional generalization.

Problem

Research questions and friction points this paper is trying to address.

code decompilation

reverse engineering

malware analysis

obfuscation

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-adaptive LLM

bidirectional decompilation

multi-adapter fine-tuning