π€ AI Summary
Large language models exhibit sensitivity to input perturbations, yet conventional metrics such as perplexity often fail to capture local prediction instabilities. To address this limitation, this work proposes Token Constraint Bound (Ξ΄_TCB), a novel metric that quantifies the maximum perturbation a modelβs internal state can tolerate while preserving its dominant prediction, by analyzing the geometric structure of the output embedding space. This is the first approach to characterize prediction robustness through such geometric analysis. Experimental results demonstrate that Ξ΄_TCB effectively uncovers prediction vulnerabilities in prompt engineering scenarios that are overlooked by traditional evaluation metrics, offering a new perspective for assessing and enhancing model robustness.
π Abstract
Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and perplexity fail to assess local prediction robustness, as normalized output probabilities can obscure the underlying resilience of an LLM's internal state to perturbations. We introduce the Token Constraint Bound ($\delta_{\mathrm{TCB}}$), a novel metric that quantifies the maximum internal state perturbation an LLM can withstand before its dominant next-token prediction significantly changes. Intrinsically linked to output embedding space geometry, $\delta_{\mathrm{TCB}}$ provides insights into the stability of the model's internal predictive commitment. Our experiments show $\delta_{\mathrm{TCB}}$ correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation. $\delta_{\mathrm{TCB}}$ offers a principled, complementary approach to analyze and potentially improve the contextual stability of LLM predictions.