๐ค AI Summary
When deployed via weakly access-controlled APIs, LLMs face a covert yet severe model cloning threat via logit leakage. This paper proposes a two-stage black-box model inversion method operating under low query budgets: first, high-precision recovery of the output projection matrix from a minimal number of top-k logits using singular value decomposition (SVD); second, geometrically aware knowledge distillation to construct a structurally consistent lightweight student model. To our knowledge, this is the first logit-driven approach enabling efficient and stealthy model cloning. With only tens of thousands of queries, a 6-layer student model replicates 97.6% of the teacherโs hidden-state geometry, incurring merely a 7.31% perplexity increase (NLL = 7.58). A 4-layer variant achieves 17.1% inference speedup and 18.1% parameter reduction, completing training in under 24 GPU-hours while inherently evading rate limiting.
๐ Abstract
Large Language Models (LLMs) are increasingly deployed in mission-critical systems, facilitating tasks such as satellite operations, command-and-control, military decision support, and cyber defense. Many of these systems are accessed through application programming interfaces (APIs). When such APIs lack robust access controls, they can expose full or top-k logits, creating a significant and often overlooked attack surface. Prior art has mainly focused on reconstructing the output projection layer or distilling surface-level behaviors. However, regenerating a black-box model under tight query constraints remains underexplored. We address that gap by introducing a constrained replication pipeline that transforms partial logit leakage into a functional deployable substitute model clone. Our two-stage approach (i) reconstructs the output projection matrix by collecting top-k logits from under 10k black-box queries via singular value decomposition (SVD) over the logits, then (ii) distills the remaining architecture into compact student models with varying transformer depths, trained on an open source dataset. A 6-layer student recreates 97.6% of the 6-layer teacher model's hidden-state geometry, with only a 7.31% perplexity increase, and a 7.58 Negative Log-Likelihood (NLL). A 4-layer variant achieves 17.1% faster inference and 18.1% parameter reduction with comparable performance. The entire attack completes in under 24 graphics processing unit (GPU) hours and avoids triggering API rate-limit defenses. These results demonstrate how quickly a cost-limited adversary can clone an LLM, underscoring the urgent need for hardened inference APIs and secure on-premise defense deployments.