Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

๐Ÿ“… 2026-03-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses growing concerns about third-party โ€œshadowโ€ APIs that claim to offer unrestricted access to state-of-the-art large language models, yet whose outputs may significantly diverge from those of official APIsโ€”posing risks to research reproducibility and user rights. To systematically evaluate these discrepancies, this work proposes the first multidimensional auditing framework, assessing shadow APIs across functional utility, safety behaviors, and model identity verification. Empirical results reveal substantial deviations: performance inconsistencies reach up to 47.21%, model fingerprint verification fails in 45.83% of cases, and safety behaviors exhibit high unpredictability. These findings expose significant deceptive practices within shadow APIs and provide critical empirical evidence to inform efforts toward greater transparency and trustworthy usage in the API ecosystem.

Technology Category

Application Category

๐Ÿ“ Abstract
Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
Problem

Research questions and friction points this paper is trying to address.

shadow APIs
large language models
model verification
API reliability
deceptive claims
Innovation

Methods, ideas, or system contributions that make the work stand out.

shadow APIs
model verification
LLM auditing
API deception
reproducibility
๐Ÿ”Ž Similar Papers