Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the challenge posed by adversarial code obfuscation and encryption to the automated extraction of Indicators of Compromise (IoCs) from malware. It presents the first systematic evaluation of large language models’ (LLMs) capability to recover IoCs from JavaScript programs subjected to a 12-level progressive obfuscation pipeline and strong encryption schemes, including XOR and AES-256. To enable rigorous assessment, the authors develop a standardized, reproducible automated testing framework. Experimental results demonstrate that LLMs perform robustly under lightweight obfuscation techniques—such as variable renaming and Base64 encoding—but exhibit a significant performance drop when confronted with strong encryption. These findings reveal that encryption remains a critical bottleneck for LLM-driven code analysis and highlight a clear direction for future research in enhancing model resilience to cryptographic transformations.

📝 Abstract

Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise (IoCs) such as IP addresses within source code. While Large Language Models (LLMs) have recently demonstrated remarkable progress in code reasoning and transformation, their resilience against adversarial concealment techniques remains largely uncharted. This paper introduces a systematic benchmark for secret detection under adversarial code transformations, designed to evaluate the capacity of LLMs to recover IoCs embedded in obfuscated and encrypted JavaScript programs. We construct a dataset of 336 programs, progressively transformed through 12 levels of obfuscation and cryptographic concealment (including XOR and AES-256), to emulate realistic threat scenarios. An automated evaluation framework standardizes LLM queries and responses, enabling reproducible, large-scale testing across diverse models. Our results reveal a dichotomy: while LLMs exhibit high success against lightweight transformations such as variable renaming and Base64 encoding, encryption-based concealment severely degrades detection performance. These findings establish encryption as a critical frontier for LLM-driven code analysis and highlight both current limitations and avenues for advancing automated threat intelligence.

Problem

Research questions and friction points this paper is trying to address.

IoC recovery

code obfuscation

encryption

Large Language Models

adversarial concealment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Adversarial Code Obfuscation

IoC Recovery