Visualizing token importance for black-box language models

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Auditing token-level input dependencies of black-box large language models (LLMs) remains challenging in high-stakes domains (e.g., law, healthcare), where gradient-based or model-architecture-dependent attribution methods are inapplicable. Method: We propose Distribution-Based Sensitivity Analysis (DBSA), the first gradient-free, architecture-agnostic, and distribution-assumption-free token-level attribution method. DBSA leverages sampling and output distribution divergence metrics, integrated with statistical significance testing and interactive visualization, to enable robust attribution for stochastic LLMs. Results: Evaluated across multiple mainstream black-box LLM APIs, DBSA rapidly identifies critical and fragile tokens, uncovers implicit dependency patterns overlooked by conventional approaches, and significantly enhances model interpretability and auditability—while requiring no model access beyond API calls and imposing minimal computational overhead.

Technology Category

Application Category

📝 Abstract

We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as legal, medical, and regulatory compliance. Existing approaches for LLM auditing often focus on isolated aspects of model behavior, such as detecting specific biases or evaluating fairness. We are interested in a more general question -- can we understand how the outputs of black-box LLMs depend on each input token? There is a critical need to have such tools in real-world applications that rely on inaccessible API endpoints to language models. However, this is a highly non-trivial problem, as LLMs are stochastic functions (i.e. two outputs will be different by chance), while computing prompt-level gradients to approximate input sensitivity is infeasible. To address this, we propose Distribution-Based Sensitivity Analysis (DBSA), a lightweight model-agnostic procedure to evaluate the sensitivity of the output of a language model for each input token, without making any distributional assumptions about the LLM. DBSA is developed as a practical tool for practitioners, enabling quick, plug-and-play visual exploration of LLMs reliance on specific input tokens. Through illustrative examples, we demonstrate how DBSA can enable users to inspect LLM inputs and find sensitivities that may be overlooked by existing LLM interpretability methods.

Problem

Research questions and friction points this paper is trying to address.

Audit black-box LLMs for reliable behavior in high-stakes domains

Understand how LLM outputs depend on each input token

Provide a practical tool for token-level sensitivity analysis without gradients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution-Based Sensitivity Analysis for black-box models

Model-agnostic procedure without distributional assumptions

Plug-and-play visual exploration of token importance

🔎 Similar Papers

Towards Interpreting Visual Information Processing in Vision-Language Models