Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

πŸ“… 2026-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses a critical β€œtrust paradox” in commercial large language model (LLM) services that charge by token: providers may overbill by inflating token counts, yet existing auditing mechanisms are ineffective due to the opacity of models, tokenizers, and execution pipelines. We systematically demonstrate that when audit evidence is solely supplied by the service provider, manipulation cannot be reliably prevented. Through reverse engineering and controlled experiments, we devise undetectable token inflation strategies leveraging tokenizer ambiguities and chain-of-thought obfuscation. Simulated attacks on mainstream auditing frameworks reveal that, even under the most permissive conditions, token counts can be inflated by an average of 1,469% without detection; notably, even when users observe the full reasoning trace, hidden overcharging of up to 50.85% remains feasible.
πŸ“ Abstract
Per-token billing is now the standard pricing model for commercial large language models (LLMs), so the honesty of reported token counts directly affects what users pay. We show that this kind of billing is hard to audit by design: providers hide the model, the tokenizer, and the execution to protect their IP, mitigate jailbreaks, and preserve user privacy, which means an auditor can only inspect proofs the provider supplies. The audit therefore reduces to a consistency check on the provider's own reports. We call this a trust paradox: every audit must trust some artifact, but current frameworks trust exactly the ones a provider has the strongest reason to manipulate. We study three recent token auditing frameworks and show that a provider with ordinary commercial capabilities can systematically inflate billed token counts. In the most permissive setting, hidden reasoning usage can be inflated by 1,469% on average without detection. At current frontier reasoning prices, that turns a \$100 honest bill into roughly a \$1,569 bill on the same query. Even when the user can see the full reasoning string, tokenization ambiguity alone still allows 50.85% over-reporting below the detection threshold. These results suggest the problem is not in any specific auditor but in any audit whose evidence comes from the audited party. Restoring honest billing will require verification that ties reported token counts to evidence the provider does not control, such as trusted execution attestation, cryptographic proofs of inference, or third-party re-execution.
Problem

Research questions and friction points this paper is trying to address.

token inflation
LLM billing
auditability
trust paradox
tokenization ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

token inflation
trust paradox
LLM billing
auditing vulnerability
cryptographic verification
πŸ”Ž Similar Papers