Learning to Triage Taint Flows Reported by Dynamic Program Analysis in Node.js Packages

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Dynamic program analysis in the Node.js ecosystem generates an overwhelming volume of taint-flow vulnerability alerts, imposing substantial manual review overhead. Method: This paper proposes a machine learning framework for vulnerability prioritization, systematically evaluating for the first time the effectiveness of large language models (LLMs), graph neural networks (GNNs), and their hybrid architecture (GNN-LLM) on this task. The framework jointly models structured features (e.g., control- and data-flow graphs) and semantic features (e.g., code context and API usage patterns) extracted from dynamic analysis outputs. Contribution/Results: We establish a novel multimodal modeling paradigm tailored for vulnerability triage. Empirical evaluation demonstrates its superiority: the best-performing model achieves an F1-score of 0.915; under a strict <7% false-negative rate constraint, it filters 66.9% of false positives; and average inference latency per package is ~60 ms—achieving both high precision and practical deployability.

Technology Category

Application Category

📝 Abstract

Program analysis tools often produce large volumes of candidate vulnerability reports that require costly manual review, creating a practical challenge: how can security analysts prioritize the reports most likely to be true vulnerabilities? This paper investigates whether machine learning can be applied to prioritizing vulnerabilities reported by program analysis tools. We focus on Node.js packages and collect a benchmark of 1,883 Node.js packages, each containing one reported ACE or ACI vulnerability. We evaluate a variety of machine learning approaches, including classical models, graph neural networks (GNNs), large language models (LLMs), and hybrid models that combine GNN and LLMs, trained on data based on a dynamic program analysis tool's output. The top LLM achieves $F_{1} {=} 0.915$, while the best GNN and classical ML models reaching $F_{1} {=} 0.904$. At a less than 7% false-negative rate, the leading model eliminates 66.9% of benign packages from manual review, taking around 60 ms per package. If the best model is tuned to operate at a precision level of 0.8 (i.e., allowing 20% false positives amongst all warnings), our approach can detect 99.2% of exploitable taint flows while missing only 0.8%, demonstrating strong potential for real-world vulnerability triage.

Problem

Research questions and friction points this paper is trying to address.

Prioritizing vulnerability reports from dynamic program analysis tools

Reducing manual review burden for Node.js package security alerts

Applying machine learning to classify taint flow vulnerabilities accurately

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning prioritizes vulnerability reports from analysis tools

Hybrid models combine graph neural networks with large language models

Dynamic program analysis trains models to reduce manual review burden

🔎 Similar Papers

Leveraging Large Language Models to Detect npm Malicious Packages