Identified-Set Geometry of Distributional Model Extraction under Top-$K$ Censored API Access

๐Ÿ“… 2026-05-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

203K/year
๐Ÿค– AI Summary
Large language model APIs often return only the top-K logits, masking the remainder of the vocabulary and thereby hindering full reconstruction of the output distribution. This work investigates the theoretical limits of per-token distribution recoverability under such constraints. By constructing identification sets compatible with the teacher model, we derive the first exact expression for the total variation diameter and establish computable upper and lower bounds under KL divergence. Our theoretical analysis reveals a decoupling between capability extraction and distributional fidelity. Experiments on the Qwen3 mathematical reasoning model show that top-K distillation recovers merely 12% of the teacherโ€™s private capabilities, full-logit distillation achieves 56%, and generative extraction reaches 96%, demonstrating that while top-K masking impedes distribution reconstruction, it does not preclude highly effective capability transfer.
๐Ÿ“ Abstract
Modern LLM APIs often reveal only top-$K$ logit scores and censor the remaining vocabulary. We study the per-position distribution-recovery limits of this access model. For censoring threshold $ฯ„$, the compatible teacher distributions form an identified set whose total-variation diameter is exactly $U_K=(V-K)\exp(ฯ„)/(Z_A+(V-K)\exp(ฯ„))$, where $Z_A$ is the observed partition function. For KL recovery, we give a computable binary-endpoint lower bound and an asymptotically matching small-ambiguity upper bound, with an extension to reference-aware attackers. Experiments on a Qwen3 math-reasoning teacher reveal a layered extraction hierarchy: on-task top-$K$ distillation recovers 12% of private capability, full-logit distillation recovers 56% despite 99% KL closure, and generation-based extraction recovers 96%. Top-$K$ censoring therefore limits per-position distribution recovery but does not by itself prevent capability extraction, separating fidelity from transfer in prompt-only logit distillation.
Problem

Research questions and friction points this paper is trying to address.

top-K censoring
distribution recovery
identified set
LLM APIs
logit extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

top-K censoring
distributional model extraction
identified set
logit distillation
capability extraction
๐Ÿ”Ž Similar Papers