🤖 AI Summary
This work addresses the challenge of identifying underlying large language model (LLM) versions in LLM-integrated applications, where model provenance is often opaque. We propose LLMmap, the first active fingerprinting technique tailored to this setting. Methodologically, it leverages domain-informed query generation, response pattern modeling, and statistical significance testing to capture fine-grained behavioral signatures of LLM outputs. Its design ensures robustness across vendors, architectures (e.g., RAG, chain-of-thought), system prompts, and sampling perturbations. Evaluated on 42 open- and closed-source LLMs—including real-world deployments with unknown prompts and generation frameworks—LLMmap achieves >95% identification accuracy using only eight queries. To our knowledge, this is the first approach enabling high-accuracy, low-overhead, and generalizable LLM version fingerprinting at the application layer.
📝 Abstract
We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.