Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study addresses growing concerns about third-party “shadow” APIs that claim to offer unrestricted access to state-of-the-art large language models, yet whose outputs may significantly diverge from those of official APIs—posing risks to research reproducibility and user rights. To systematically evaluate these discrepancies, this work proposes the first multidimensional auditing framework, assessing shadow APIs across functional utility, safety behaviors, and model identity verification. Empirical results reveal substantial deviations: performance inconsistencies reach up to 47.21%, model fingerprint verification fails in 45.83% of cases, and safety behaviors exhibit high unpredictability. These findings expose significant deceptive practices within shadow APIs and provide critical empirical evidence to inform efforts toward greater transparency and trustworthy usage in the API ecosystem.

Technology Category

Application Category

📝 Abstract

Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.

Problem

Research questions and friction points this paper is trying to address.

shadow APIs

large language models

model verification

API reliability

deceptive claims

Innovation

Methods, ideas, or system contributions that make the work stand out.

shadow APIs

model verification

LLM auditing