ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low information density and opaque syntactic structure in assembly code hinder large language models’ (LLMs) semantic understanding. To address this, we propose ASMA-Tune, an end-to-end structure-semantic instruction tuning framework. Its core innovation is a learnable projection module that jointly bridges BERT-style encoders—modeling instruction-level structural patterns—with decoder-only LLMs (e.g., LLaMA, Qwen)—capturing deep semantic representations, enabling the first fine-grained structural awareness and semantic instruction alignment. We further introduce the first high-quality assembly instruction dataset and a novel two-stage training strategy: structural masking pretraining followed by semantic alignment fine-tuning. On multiple assembly understanding benchmarks, ASMA-Tune achieves a 23.6% improvement in instruction-following accuracy and attains 78.4% F1 score on code semantic reconstruction—substantially outperforming state-of-the-art methods. Both the model and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Analysis and comprehension of assembly code are crucial in various applications, such as reverse engineering. However, the low information density and lack of explicit syntactic structures in assembly code pose significant challenges. Pioneering approaches with masked language modeling (MLM)-based methods have been limited by facilitating natural language interaction. While recent methods based on decoder-focused large language models (LLMs) have significantly enhanced semantic representation, they still struggle to capture the nuanced and sparse semantics in assembly code. In this paper, we propose Assembly Augmented Tuning (ASMA-Tune), an end-to-end structural-semantic instruction-tuning framework. Our approach synergizes encoder architectures with decoder-based LLMs through projector modules to enable comprehensive code understanding. Experiments show that ASMA-Tune outperforms existing benchmarks, significantly enhancing assembly code comprehension and instruction-following abilities. Our model and dataset are public at https://github.com/wxy3596/ASMA-Tune.
Problem

Research questions and friction points this paper is trying to address.

Enhance assembly code comprehension via structural-semantic tuning
Address challenges in low information density and sparse semantics
Improve instruction-following abilities in assembly code analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines encoder-decoder architectures for code understanding
Uses structural-semantic instruction-tuning framework
Enhances assembly code comprehension via projector modules
🔎 Similar Papers
No similar papers found.
X
Xinyi Wang
Nankai University
Jiashui Wang
Jiashui Wang
Zhejiang University
Software SecurityCyber SecurityLanguage AgentArtificial IntelligenceBusiness Security
P
Peng Chen
University of the Chinese Academy of Sciences
J
Jinbo Su
Yanming Liu
Yanming Liu
Zhejiang University
Efficient LLMPrivate NLPRAGLLM Agent
Long Liu
Long Liu
Professor, School of Biotechnology, Jiangnan University
Metabolic engineeringSynthetic BiologySystems Biology
Y
Yangdong Wang
Ant Group
Q
Qiyuan Chen
Ant Group
K
Kai Yun
Nankai University
C
Chunfu Jia
Nankai University