🤖 AI Summary
This work proposes IntelGuard, a novel framework for detecting malicious packages in open-source ecosystems that overcomes the limitations of brittle rule-based systems and data-driven approaches struggling to capture semantic evolution. IntelGuard uniquely integrates expert reasoning with large language models (LLMs) through a retrieval-augmented generation (RAG) mechanism, leveraging a structured threat intelligence knowledge base to perform semantic comparison and behavioral analysis of new packages. This enables interpretable, obfuscation-resilient, and semantically aware detection. Evaluated on 4,027 real-world packages, IntelGuard achieves 99% accuracy with a false positive rate of only 0.50%, maintains 96.5% accuracy against obfuscated code, and successfully identifies 54 previously unreported malicious packages in PyPI.
📝 Abstract
Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks, yet existing detection methods either depend on fragile handcrafted rules or data-driven features that fail to capture evolving attack semantics. We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection. IntelGuard constructs a structured knowledge base from over 8,000 threat intelligence reports, linking malicious code snippets with behavioral descriptions and expert reasoning. When analyzing new packages, it retrieves semantically similar malicious examples and applies LLM-guided reasoning to assess whether code behaviors align with intended functionality. Experiments on 4,027 real-world packages show that IntelGuard achieves 99% accuracy and a 0.50% false positive rate, while maintaining 96.5% accuracy on obfuscated code. Deployed on PyPI.org, it discovered 54 previously unreported malicious packages, demonstrating interpretable and robust detection guided by expert knowledge.