LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the challenge of quantifying knowledge attribution—i.e., distinguishing between retrieved context and pretrained knowledge—as the source of LLM responses in RAG-enhanced security analysis, this paper introduces LEA (Layer-wise Embedding Attribution), the first attribution method based on hidden-state embedding similarity. LEA computes layer-wise cosine similarities between LLM hidden states and both the retrieved context and question representations, thereby decomposing contextual dependency strength across layers and revealing hierarchical patterns of knowledge integration. It requires no fine-tuning and introduces no additional parameters, enabling fine-grained, auditable attribution of knowledge sources. Evaluated on 100 recent critical CVE samples, LEA accurately quantifies the contribution ratio of retrieved context, significantly enhancing the interpretability and audit confidence of RAG-LLM outputs. This work establishes a novel paradigm of “auditable knowledge provenance” for secure, transparent RAG systems.

Technology Category

Application Category

📝 Abstract

Security vulnerabilities are rapidly increasing in frequency and complexity, creating a shifting threat landscape that challenges cybersecurity defenses. Large Language Models (LLMs) have been widely adopted for cybersecurity threat analysis. When querying LLMs, dealing with new, unseen vulnerabilities is particularly challenging as it lies outside LLMs' pre-trained distribution. Retrieval-Augmented Generation (RAG) pipelines mitigate the problem by injecting up-to-date authoritative sources into the model context, thus reducing hallucinations and increasing the accuracy in responses. Meanwhile, the deployment of LLMs in security-sensitive environments introduces challenges around trust and safety. This raises a critical open question: How to quantify or attribute the generated response to the retrieved context versus the model's pre-trained knowledge? This work proposes LLM Embedding-based Attribution (LEA) -- a novel, explainable metric to paint a clear picture on the 'percentage of influence' the pre-trained knowledge vs. retrieved content has for each generated response. We apply LEA to assess responses to 100 critical CVEs from the past decade, verifying its effectiveness to quantify the insightfulness for vulnerability analysis. Our development of LEA reveals a progression of independency in hidden states of LLMs: heavy reliance on context in early layers, which enables the derivation of LEA; increased independency in later layers, which sheds light on why scale is essential for LLM's effectiveness. This work provides security analysts a means to audit LLM-assisted workflows, laying the groundwork for transparent, high-assurance deployments of RAG-enhanced LLMs in cybersecurity operations.

Problem

Research questions and friction points this paper is trying to address.

Quantify source contributions to LLM responses for vulnerability analysis

Measure influence of retrieved context vs pre-trained knowledge

Assess trust and safety in LLM-assisted cybersecurity workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM Embedding-based Attribution for source influence

Quantifies retrieved vs pre-trained knowledge impact

Analyzes hidden states for transparency in responses

🔎 Similar Papers

No similar papers found.