Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses key limitations of large language models (LLMs) in software security—namely, inadequate semantic modeling, poor interpretability, and deployment challenges—by proposing the first LLM-based semantic-structural joint modeling framework for static malware analysis. Methodologically, it integrates assembly/source-code tokenization, cross-language malicious pattern embedding, and domain-specific fine-tuning to build a lightweight, interpretable, security-specialized Transformer model. Contributions include: (1) establishing a research taxonomy and challenge classification framework for LLM applications in security; (2) systematically unifying over 30 mainstream LLMs and 15 domain-specific datasets (e.g., EMBER, MalwareBench); and (3) achieving a 12–18% improvement in zero-day malware detection accuracy while enabling real-time threat inference. The framework provides both theoretical foundations and practical pathways for automated reverse engineering and next-generation malware detection.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently emerged as powerful tools in cybersecurity, offering advanced capabilities in malware detection, generation, and real-time monitoring. Numerous studies have explored their application in cybersecurity, demonstrating their effectiveness in identifying novel malware variants, analyzing malicious code structures, and enhancing automated threat analysis. Several transformer-based architectures and LLM-driven models have been proposed to improve malware analysis, leveraging semantic and structural insights to recognize malicious intent more accurately. This study presents a comprehensive review of LLM-based approaches in malware code analysis, summarizing recent advancements, trends, and methodologies. We examine notable scholarly works to map the research landscape, identify key challenges, and highlight emerging innovations in LLM-driven cybersecurity. Additionally, we emphasize the role of static analysis in malware detection, introduce notable datasets and specialized LLM models, and discuss essential datasets supporting automated malware research. This study serves as a valuable resource for researchers and cybersecurity professionals, offering insights into LLM-powered malware detection and defence strategies while outlining future directions for strengthening cybersecurity resilience.

Problem

Research questions and friction points this paper is trying to address.

LLMs enhance malware detection and generation in cybersecurity

LLMs improve semantic analysis of malicious code structures

Reviewing LLM-based approaches for automated malware research advancements

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for malware detection and generation

Transformer-based architectures for code analysis

Static analysis in automated threat detection

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review