Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages

๐Ÿ“… 2026-01-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

187K/year
๐Ÿค– AI Summary
This work proposes IntelGuard, a novel framework for detecting malicious packages in open-source ecosystems that overcomes the limitations of brittle rule-based systems and data-driven approaches struggling to capture semantic evolution. IntelGuard uniquely integrates expert reasoning with large language models (LLMs) through a retrieval-augmented generation (RAG) mechanism, leveraging a structured threat intelligence knowledge base to perform semantic comparison and behavioral analysis of new packages. This enables interpretable, obfuscation-resilient, and semantically aware detection. Evaluated on 4,027 real-world packages, IntelGuard achieves 99% accuracy with a false positive rate of only 0.50%, maintains 96.5% accuracy against obfuscated code, and successfully identifies 54 previously unreported malicious packages in PyPI.

Technology Category

Application Category

๐Ÿ“ Abstract
Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks, yet existing detection methods either depend on fragile handcrafted rules or data-driven features that fail to capture evolving attack semantics. We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection. IntelGuard constructs a structured knowledge base from over 8,000 threat intelligence reports, linking malicious code snippets with behavioral descriptions and expert reasoning. When analyzing new packages, it retrieves semantically similar malicious examples and applies LLM-guided reasoning to assess whether code behaviors align with intended functionality. Experiments on 4,027 real-world packages show that IntelGuard achieves 99% accuracy and a 0.50% false positive rate, while maintaining 96.5% accuracy on obfuscated code. Deployed on PyPI.org, it discovered 54 previously unreported malicious packages, demonstrating interpretable and robust detection guided by expert knowledge.
Problem

Research questions and friction points this paper is trying to address.

supply chain attacks
malicious package detection
open-source ecosystems
attack semantics
false positive rate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)
Expert Reasoning
Malicious Package Detection
Threat Intelligence Knowledge Base
LLM-Guided Reasoning
๐Ÿ’ผ Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizerโ€™s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of lifeโ€™s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site โ€“ U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge