🤖 AI Summary
In x86 binary reverse engineering, low analysis efficiency arises from pervasive symbol stripping and aggressive obfuscation, while reliance on closed-source, cloud-based large language models (LLMs) introduces privacy and security risks. To address these challenges, we propose REx86—the first open-source, domain-specific, locally deployable LLM for x86 assembly understanding. Leveraging LoRA for parameter-efficient fine-tuning, REx86 adapts CodeLlama, Qwen2.5-Coder, and CodeGemma on a curated dataset of 5,981 high-quality x86 instruction–semantic pairs. Evaluation shows a 64.2% reduction in cross-entropy loss and a 20.3% improvement in semantic similarity over baselines. A user study demonstrates a significant increase in line-level code-solving accuracy—from 31% to 53%. The project releases the model weights, training dataset, and LoRA adapters, ensuring security, accuracy, and practical usability for offline, privacy-preserving reverse engineering.
📝 Abstract
Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency through automated comprehension and commenting, but cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities. We evaluate parameter-efficient fine-tuned local LLMs for assisting with x86 RE tasks in these settings. Eight open-weight models across the CodeLlama, Qwen2.5-Coder, and CodeGemma series are fine-tuned on a custom curated dataset of 5,981 x86 assembly examples. We evaluate them quantitatively and identify the fine-tuned Qwen2.5-Coder-7B as the top performer, which we name REx86.
REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground truth by 20.3% over its base model. In a limited user case study (n=43), REx86 significantly enhanced line-level code understanding (p = 0.031) and increased the correct-solve rate from 31% to 53% (p = 0.189), though the latter did not reach statistical significance. Qualitative analysis shows more accurate, concise comments with fewer hallucinations.
REx86 delivers state-of-the-art assistance in x86 RE among local, open-weight LLMs. Our findings demonstrate the value of domain-specific fine-tuning, and highlight the need for more commented disassembly data to further enhance LLM performance in RE. REx86, its dataset, and LoRA adapters are publicly available at https://github.com/dlea8/REx86 and https://zenodo.org/records/15420461.