PotentRegion4MalDetect: Advanced Features from Potential Malicious Regions for Malware Detection

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of malware evading detection via code injection into benign binaries, this paper proposes a region-aware feature extraction method targeting potentially malicious code regions. First, StringSifter identifies high-risk string regions; then, localized control-flow graphs (CFGs) are constructed, and fine-grained features are extracted exclusively from nodes suspected of malignancy. Crucially, this work introduces the first fusion of partially preprocessed CFGs with full semantic features, augmented by SHAP-based interpretability analysis. Experimental evaluation on standard benchmarks achieves >99.0% accuracy, precision, recall, F1-score, and AUC, with a low false positive rate of 0.064%. Moreover, the method reduces memory overhead and inference latency. The core contribution lies in the “region-aware” feature extraction paradigm, which significantly enhances both detection robustness—particularly against injection-based evasion—and model interpretability.

Technology Category

Application Category

📝 Abstract
Malware developers exploit the fact that most detection models focus on the entire binary to extract the feature rather than on the regions of potential maliciousness. Therefore, they reverse engineer a benign binary and inject malicious code into it. This obfuscation technique circumvents the malware detection models and deceives the ML classifiers due to the prevalence of benign features compared to malicious features. However, extracting the features from the potential malicious regions enhances the accuracy and decreases false positives. Hence, we propose a novel model named PotentRegion4MalDetect that extracts features from the potential malicious regions. PotentRegion4MalDetect determines the nodes with potential maliciousness in the partially preprocessed Control Flow Graph (CFG) using the malicious strings given by StringSifter. Then, it extracts advanced features of the identified potential malicious regions alongside the features from the completely preprocessed CFG. The features extracted from the completely preprocessed CFG mitigate obfuscation techniques that attempt to disguise malicious content, such as suspicious strings. The experiments reveal that the PotentRegion4MalDetect requires fewer entries to save the features for all binaries than the model focusing on the entire binary, reducing memory overhead, faster computation, and lower storage requirements. These advanced features give an 8.13% increase in SHapley Additive exPlanations (SHAP) Absolute Mean and a 1.44% increase in SHAP Beeswarm value compared to those extracted from the entire binary. The advanced features outperform the features extracted from the entire binary by producing more than 99% accuracy, precision, recall, AUC, F1-score, and 0.064% FPR.
Problem

Research questions and friction points this paper is trying to address.

Detect malware by focusing on potential malicious regions instead of entire binaries
Enhance detection accuracy and reduce false positives with advanced feature extraction
Mitigate obfuscation techniques disguising malicious content in control flow graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts features from potential malicious regions
Uses partially preprocessed Control Flow Graph
Leverages StringSifter for malicious strings
🔎 Similar Papers
No similar papers found.
R
Rama Krishna Koppanati
Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India
Monika Santra
Monika Santra
Researcher
Software SecurityBinary AnalysisMachine LearningNetwork ProtocolTrusted Systems
Sateesh Kumar Peddoju
Sateesh Kumar Peddoju
IIT Roorkee
Cloud ComputingInternet of ThingsEdge ComputingCyber Security