Clone What You Can't Steal: Black-Box LLM Replication via Logit Leakage and Distillation

๐Ÿ“… 2025-08-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
When deployed via weakly access-controlled APIs, LLMs face a covert yet severe model cloning threat via logit leakage. This paper proposes a two-stage black-box model inversion method operating under low query budgets: first, high-precision recovery of the output projection matrix from a minimal number of top-k logits using singular value decomposition (SVD); second, geometrically aware knowledge distillation to construct a structurally consistent lightweight student model. To our knowledge, this is the first logit-driven approach enabling efficient and stealthy model cloning. With only tens of thousands of queries, a 6-layer student model replicates 97.6% of the teacherโ€™s hidden-state geometry, incurring merely a 7.31% perplexity increase (NLL = 7.58). A 4-layer variant achieves 17.1% inference speedup and 18.1% parameter reduction, completing training in under 24 GPU-hours while inherently evading rate limiting.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) are increasingly deployed in mission-critical systems, facilitating tasks such as satellite operations, command-and-control, military decision support, and cyber defense. Many of these systems are accessed through application programming interfaces (APIs). When such APIs lack robust access controls, they can expose full or top-k logits, creating a significant and often overlooked attack surface. Prior art has mainly focused on reconstructing the output projection layer or distilling surface-level behaviors. However, regenerating a black-box model under tight query constraints remains underexplored. We address that gap by introducing a constrained replication pipeline that transforms partial logit leakage into a functional deployable substitute model clone. Our two-stage approach (i) reconstructs the output projection matrix by collecting top-k logits from under 10k black-box queries via singular value decomposition (SVD) over the logits, then (ii) distills the remaining architecture into compact student models with varying transformer depths, trained on an open source dataset. A 6-layer student recreates 97.6% of the 6-layer teacher model's hidden-state geometry, with only a 7.31% perplexity increase, and a 7.58 Negative Log-Likelihood (NLL). A 4-layer variant achieves 17.1% faster inference and 18.1% parameter reduction with comparable performance. The entire attack completes in under 24 graphics processing unit (GPU) hours and avoids triggering API rate-limit defenses. These results demonstrate how quickly a cost-limited adversary can clone an LLM, underscoring the urgent need for hardened inference APIs and secure on-premise defense deployments.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing black-box LLMs under tight query constraints
Transforming partial logit leakage into functional model clones
Addressing API vulnerabilities exposing logits in mission-critical systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs projection matrix via SVD
Distills architecture into compact student models
Uses partial logit leakage for model cloning
๐Ÿ”Ž Similar Papers
No similar papers found.
K
Kanchon Gharami
Department of Electrical Engineering and Computer Science, Embry-Riddle Aeronautical University, FL, USA
Hansaka Aluvihare
Hansaka Aluvihare
Embry-Riddle Aeronautical University
Machine LearningDeep Learning
Shafika Showkat Moni
Shafika Showkat Moni
Assistant Professor
Security and Privacy of VANETInternet of VehiclesInternet of ThingsWireless Networks
B
Berker Pekรถz
Department of Electrical Engineering and Computer Science, Embry-Riddle Aeronautical University, FL, USA