Fingerprinting Inference Systems of Large Language Models

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work demonstrates that subtle numerical discrepancies introduced by different components—such as inference engines, attention backends, and hardware platforms—in large language model (LLM) inference systems can serve as distinctive fingerprints, inadvertently revealing system configurations and posing security risks. The paper presents the first fingerprinting methodology based on prompt-response behavior and numerical deviation analysis, achieving high-accuracy identification of these components through empirical evaluation. The study shows that this approach reliably infers system configurations even under non-zero temperature sampling, highlighting inherent limitations in existing defense mechanisms. Furthermore, it explores potential mitigation strategies and analyzes their practical implications, underscoring the tension between deployability and robustness in securing LLM inference pipelines.
📝 Abstract
The behavior of LLMs does not depend solely on the model itself. Components of the inference system, such as the inference engine, attention backend, and hardware platform, subtly influence how inputs are processed. These components differ in their implementations and thereby induce small numerical deviations across systems when running the same model. While prior work has established the theoretical existence of such deviations, their security implications have remained unexplored. In this paper, we show that these deviations are characteristic of specific components and propagate to observable textual outputs, exposing the inference system to any party that can query the model. Building on this observation, we introduce a fingerprinting method that analyzes the prompt-response behavior of LLMs to identify components of the inference system. Our empirical evaluation demonstrates that the inference engine, attention backend, and underlying hardware platform can be identified reliably, even when the LLM is operated at non-zero temperature. We show that preventing fingerprinting is fundamentally hard, as it would require eliminating numerical differences between hardware and software stacks. We therefore propose partial mitigations and discuss their impact.
Problem

Research questions and friction points this paper is trying to address.

fingerprinting
large language models
inference systems
numerical deviations
security implications
Innovation

Methods, ideas, or system contributions that make the work stand out.

fingerprinting
inference system
large language models
numerical deviations
hardware identification