CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Commercial closed-source LLM APIs employ invisible inference tokens—unobservable during API invocation—whose counts and semantic content are neither verifiable nor transparent, leading to opaque or even malicious billing inflation. To address this, we propose the first dual-dimensional verifiable auditing framework for LLM services. First, we construct a Merkle hash tree grounded in token-embedding fingerprints to enable cryptographic verification of token counts. Second, we integrate cosine similarity with semantic clustering to detect semantic inconsistencies and verify the authenticity of hidden tokens. Deployed as a trusted third-party auditor, our framework achieves up to 94.7% accuracy in identifying token-count inflation across real-world API calls. This significantly enhances billing transparency and strengthens user trust in commercial LLM services.

Technology Category

Application Category

📝 Abstract
As post-training techniques evolve, large language models (LLMs) are increasingly augmented with structured multi-step reasoning abilities, often optimized through reinforcement learning. These reasoning-enhanced models outperform standard LLMs on complex tasks and now underpin many commercial LLM APIs. However, to protect proprietary behavior and reduce verbosity, providers typically conceal the reasoning traces while returning only the final answer. This opacity introduces a critical transparency gap: users are billed for invisible reasoning tokens, which often account for the majority of the cost, yet have no means to verify their authenticity. This opens the door to token count inflation, where providers may overreport token usage or inject synthetic, low-effort tokens to inflate charges. To address this issue, we propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens. CoIn constructs a verifiable hash tree from token embedding fingerprints to check token counts, and uses embedding-based relevance matching to detect fabricated reasoning content. Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%, showing the strong ability to restore billing transparency in opaque LLM services. The dataset and code are available at https://github.com/CASE-Lab-UMD/LLM-Auditing-CoIn.
Problem

Research questions and friction points this paper is trying to address.

Detects hidden reasoning tokens in commercial LLM APIs
Verifies authenticity of billed but invisible tokens
Prevents token count inflation by providers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifiable hash tree for token count auditing
Embedding-based relevance matching for content validation
Third-party auditing framework for billing transparency
🔎 Similar Papers
No similar papers found.