Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of transparency in black-box LLM APIs—specifically, users’ inability to detect whether an API silently substitutes the advertised model with quantized, maliciously fine-tuned, or architecturally heterogeneous variants. We propose a lightweight statistical detection framework grounded in ranking consistency. Relying solely on API-generated text outputs—without requiring access to model weights or logits—the method applies rank transformation and hypothesis testing under a behavioral consistency assumption to model output distributions. Coupled with robust, evasion-resistant query design, it achieves high statistical power under low query budgets (hundreds of calls). Our novel rank-based uniformity test framework offers query efficiency, strong resilience against adversarial evasion, and cross-scenario generalizability. It significantly outperforms existing approaches across diverse threats, including quantization-induced degradation, harmful fine-tuning, jailbreaking prompts, and full-model substitution.

Technology Category

Application Category

📝 Abstract
As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution, showing that it consistently achieves superior statistical power over prior methods under constrained query budgets.
Problem

Research questions and friction points this paper is trying to address.

Detect undisclosed model variants in black-box LLM APIs
Verify behavioral equality without access to model weights
Identify performance degradation and safety compromises efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank-based uniformity test for black-box LLM verification
Query-efficient method without detectable patterns
Robust against adversarial rerouting and mixing
🔎 Similar Papers
No similar papers found.
X
Xiaoyuan Zhu
University of Southern California
Y
Yaowen Ye
University of California, Berkeley
T
Tianyi Qiu
Peking University
Hanlin Zhu
Hanlin Zhu
Ph.D. student, University of California, Berkeley
machine learningLLM reasoning
Sijun Tan
Sijun Tan
University of California, Berkeley
Machine LearningReinforcement LearningAI Security
A
Ajraf Mannan
University of Southern California
J
Jonathan Michala
University of Southern California
R
Raluca A. Popa
University of California, Berkeley
W
W. Neiswanger
University of Southern California