🤖 AI Summary
This work addresses the challenge of detecting changes in large language model (LLM) APIs under strict black-box settings, where existing methods are either prohibitively expensive or rely on internal model information. The authors propose Black-Box Boundary Input Tracking (B3IT), a novel approach that constructs “boundary inputs”—inputs for which the output distribution exhibits multiple high-probability initial tokens—enabling efficient change detection using only observed output tokens. Leveraging theoretical insights from Jacobian and Fisher information analyses in the low-temperature limit, B3IT achieves performance comparable to state-of-the-art gray-box methods, despite operating in a purely black-box regime. Empirical results demonstrate that boundary inputs are readily obtainable on non-reasoning API endpoints, and B3IT matches the detection accuracy of the best gray-box approaches while reducing detection costs by a factor of 30.
📝 Abstract
Remote change detection in LLMs is a difficult problem. Existing methods are either too expensive for deployment at scale, or require initial white-box access to model weights or grey-box access to log probabilities. We aim to achieve both low cost and strict black-box operation, observing only output tokens. Our approach hinges on specific inputs we call Border Inputs, for which there exists more than one output top token. From a statistical perspective, optimal change detection depends on the model's Jacobian and the Fisher information of the output distribution. Analyzing these quantities in low-temperature regimes shows that border inputs enable powerful change detection tests. Building on this insight, we propose the Black-Box Border Input Tracking (B3IT) scheme. Extensive in-vivo and in-vitro experiments show that border inputs are easily found for non-reasoning tested endpoints, and achieve performance on par with the best available grey-box approaches. B3IT reduces costs by $30\times$ compared to existing methods, while operating in a strict black-box setting.