🤖 AI Summary
Existing packer identification methods face a dual challenge: signature-based approaches suffer from poor generalizability and fail against dynamic evasion techniques, while machine learning–based methods rely heavily on large-scale labeled datasets, limiting scalability and adaptability. This paper proposes a lightweight static identification paradigm grounded in call-graph matching, integrating Graph Matching Networks (GMNs), static call-graph extraction, and hierarchical clustering to enable low-sample-dependency packer detection. Our method achieves 93.7% F1-score with only 10 samples per packer and attains a macro-averaged F1 of 98.3% using 100 samples per packer—matching state-of-the-art signature-based tools in accuracy. Notably, it achieves 100% recall on virtualization-based packers such as Themida—a first in the literature—and reduces sample requirements by 5–10× compared to prior approaches.
📝 Abstract
Anti-analysis techniques, particularly packing, challenge malware analysts, making packer identification fundamental. Existing packer identifiers have significant limitations: signature-based methods lack flexibility and struggle against dynamic evasion, while Machine Learning approaches require extensive training data, limiting scalability and adaptability. Consequently, achieving accurate and adaptable packer identification remains an open problem. This paper presents PackHero, a scalable and efficient methodology for identifying packers using a novel static approach. PackHero employs a Graph Matching Network and clustering to match and group Call Graphs from programs packed with known packers. We evaluate our approach on a public dataset of malware and benign samples packed with various packers, demonstrating its effectiveness and scalability across varying sample sizes. PackHero achieves a macro-average F1-score of 93.7% with just 10 samples per packer, improving to 98.3% with 100 samples. Notably, PackHero requires fewer samples to achieve stable performance compared to other Machine Learning-based tools. Overall, PackHero matches the performance of State-of-the-art signature-based tools, outperforming them in handling Virtualization-based packers such as Themida/Winlicense, with a recall of 100%.