🤖 AI Summary
Fine-grained identification of adversarial techniques in cybersecurity threat intelligence texts suffers from data scarcity and a trade-off between generalizability and domain-specific accuracy. Method: This paper proposes a lightweight, zero-shot adaptable retrieval-augmented generation (RAG) framework. It introduces a novel zero-shot large language model (LLM)-based re-ranking mechanism to enhance domain relevance in retrieval; only the generation module is fine-tuned—eliminating the need for custom retrieval model training; and integrates instruction-tuned LLMs, RAG, and lightweight technique-text pair supervised learning. Results: The framework achieves state-of-the-art performance across multiple security benchmarks without requiring large-scale labeled data or task-specific optimization. It significantly mitigates hallucination, improves technique identification accuracy, and drastically reduces domain adaptation cost.
📝 Abstract
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.