On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the instability of few-shot prompting for code vulnerability detection using large language models (LLMs). We propose a dual-criterion exemplar selection method that jointly models *error consistency*—the tendency of an LLM to commit consistent errors across similar inputs—and *k-NN similarity* in code semantic embedding space, thereby enhancing in-context learning efficacy. Unlike random sampling, our approach identifies high-informativeness “error-aware” exemplars through error pattern analysis and retrieves them precisely via semantic code embeddings. Extensive experiments across multiple state-of-the-art LLMs (e.g., CodeLlama, DeepSeek-Coder) and benchmark datasets (Devign, MultiVul) demonstrate that our method significantly improves vulnerability detection accuracy, yielding an average +5.2% F1-score gain. Moreover, the dual-criterion strategy consistently outperforms either criterion used in isolation, validating the effectiveness and generalizability of error-driven exemplar selection for code security tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks, including summarization, translation, completion, and code generation. However, detecting code vulnerabilities remains a challenging task for LLMs. An effective way to improve LLM performance is in-context learning (ICL) - providing few-shot examples similar to the query, along with correct answers, can improve an LLM's ability to generate correct solutions. However, choosing the few-shot examples appropriately is crucial to improving model performance. In this paper, we explore two criteria for choosing few-shot examples for ICL used in the code vulnerability detection task. The first criterion considers if the LLM (consistently) makes a mistake or not on a sample with the intuition that LLM performance on a sample is informative about its usefulness as a few-shot example. The other criterion considers similarity of the examples with the program under query and chooses few-shot examples based on the $k$-nearest neighbors to the given sample. We perform evaluations to determine the benefits of these criteria individually as well as under various combinations, using open-source models on multiple datasets.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal few-shot examples for LLM vulnerability detection

Evaluating mistake-based and similarity-based example selection criteria

Improving code vulnerability detection through strategic in-context learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Select examples where LLM consistently makes mistakes

Choose examples based on k-nearest neighbors similarity

Combine mistake-based and similarity-based selection criteria

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Authors to Follow