TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-grained identification of adversarial techniques in cybersecurity threat intelligence texts suffers from data scarcity and a trade-off between generalizability and domain-specific accuracy. Method: This paper proposes a lightweight, zero-shot adaptable retrieval-augmented generation (RAG) framework. It introduces a novel zero-shot large language model (LLM)-based re-ranking mechanism to enhance domain relevance in retrieval; only the generation module is fine-tuned—eliminating the need for custom retrieval model training; and integrates instruction-tuned LLMs, RAG, and lightweight technique-text pair supervised learning. Results: The framework achieves state-of-the-art performance across multiple security benchmarks without requiring large-scale labeled data or task-specific optimization. It significantly mitigates hallucination, improves technique identification accuracy, and drastically reduces domain adaptation cost.

Technology Category

Application Category

📝 Abstract
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.
Problem

Research questions and friction points this paper is trying to address.

Accurately identifying adversarial techniques in security texts
Overcoming resource-intensive methods needing large labeled datasets
Improving domain-specific precision in retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates off-the-shelf retrievers and instruction-tuned LLMs
Enhances retrieval quality with zero-shot LLM re-ranking
Fine-tunes generation on limited in-domain examples
🔎 Similar Papers
No similar papers found.