Unmasking the Shadows: Pinpoint the Implementations of Anti-Dynamic Analysis Techniques in Malware Using LLM

📅 2024-11-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

226K/year
🤖 AI Summary
To address the challenge posed by widespread adoption of Techniques Against Dynamic Analysis (TADA) in malware—which undermines sandboxing efficacy and impedes manual reverse engineering—this paper proposes the first large language model (LLM)-based method for automatic TADA code localization. Our approach integrates semantic understanding and behavioral reasoning without relying on static symbols or runtime traces. It leverages fine-tuned CodeLlama, a novel disassembled instruction sequence encoding scheme, multi-granularity contextual prompting, and cross-sample transfer learning to precisely identify stealthy detection logic. Evaluated on a public dataset, our method achieves an 87.80% localization accuracy and successfully identifies real-world TADA snippets in four prevalent malware families (e.g., Emotet and QakBot), with an average localization error of fewer than three instructions. This significantly enhances dynamic analysis robustness and accelerates reverse-engineering workflows.

Technology Category

Application Category

📝 Abstract
Sandboxes and other dynamic analysis processes are prevalent in malware detection systems nowadays to enhance the capability of detecting 0-day malware. Therefore, techniques of anti-dynamic analysis (TADA) are prevalent in modern malware samples, and sandboxes can suffer from false negatives and analysis failures when analyzing the samples with TADAs. In such cases, human reverse engineers will get involved in conducting dynamic analysis manually (i.e., debugging, patching), which in turn also gets obstructed by TADAs. In this work, we propose a Large Language Model (LLM) based workflow that can pinpoint the location of the TADA implementation in the code, to help reverse engineers place breakpoints used in debugging. Our evaluation shows that we successfully identified the locations of 87.80% known TADA implementations adopted from public repositories. In addition, we successfully pinpoint the locations of TADAs in 4 well-known malware samples that are documented in online malware analysis blogs.
Problem

Research questions and friction points this paper is trying to address.

Identifying anti-dynamic analysis techniques in malware code
Reducing false negatives in malware sandbox detection
Assisting reverse engineers in debugging malware efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based workflow for TADA location pinpointing
Helps reverse engineers place debugging breakpoints
Identifies 87.80% known TADA implementations
🔎 Similar Papers
2024-03-27ACM Transactions on Software Engineering and MethodologyCitations: 2