🤖 AI Summary
In outsourced inference for open-source large language models (LLMs), untrusted compute providers may substitute the requested model with a weaker, lower-cost one—posing a critical challenge for trustworthy verification.
Method: This paper proposes a lightweight verifiable inference protocol based on a covert mechanism: it trains auxiliary tasks exclusively on intermediate-layer representations of the target LLM to generate a unique, cryptographically unforgeable model fingerprint—without relying on cryptographic assumptions or game-theoretic constraints.
Contribution/Results: The approach achieves <3% false positive rate and <5% false negative rate, with per-verification latency under 10 ms. It is model-agnostic, supporting diverse mainstream LLMs (e.g., LLaMA, Mistral, Phi), and demonstrates robustness against strong adaptive attacks. By decoupling verification from model architecture and eliminating trust dependencies, the protocol simultaneously ensures security, efficiency, and broad applicability—setting a new practical standard for verifiable LLM inference.
📝 Abstract
The ever-increasing size of open-source Large Language Models (LLMs) renders local deployment impractical for individual users. Decentralized computing has emerged as a cost-effective solution, allowing individuals and small companies to perform LLM inference for users using surplus computational power. However, a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby benefiting from cost savings. We introduce SVIP, a secret-based verifiable LLM inference protocol. Unlike existing solutions based on cryptographic or game-theoretic techniques, our method is computationally effective and does not rest on strong assumptions. Our protocol requires the computing provider to return both the generated text and processed hidden representations from LLMs. We then train a proxy task on these representations, effectively transforming them into a unique model identifier. With our protocol, users can reliably verify whether the computing provider is acting honestly. A carefully integrated secret mechanism further strengthens its security. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. Our extensive experiments demonstrate that SVIP is accurate, generalizable, computationally efficient, and resistant to various attacks. Notably, SVIP achieves false negative rates below 5% and false positive rates below 3%, while requiring less than 0.01 seconds per prompt query for verification.