Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Model-agnostic interpretability methods for black-box large language models (LLMs) incur prohibitive costs due to frequent API calls. Method: This paper proposes a budget-aware surrogate-model-driven explanation framework that requires no access to the target LLM’s internal parameters; instead, it leverages low-cost surrogate models to generate high-fidelity explanations. Contribution/Results: We empirically demonstrate—for the first time—that surrogate models can faithfully substitute original LLMs in generating accurate, faithful explanations, and systematically validate their generalization to downstream tasks such as reasoning diagnostics. Experiments across multiple mainstream LLMs show that our approach reduces API call costs by 60–85% while preserving explanation fidelity and maintaining ≥90% of the original performance on downstream tasks. This work establishes a new paradigm for budget-conscious, model-agnostic LLM interpretability.

Technology Category

Application Category

📝 Abstract

With Large language models (LLMs) becoming increasingly prevalent in various applications, the need for interpreting their predictions has become a critical challenge. As LLMs vary in architecture and some are closed-sourced, model-agnostic techniques show great promise without requiring access to the model's internal parameters. However, existing model-agnostic techniques need to invoke LLMs many times to gain sufficient samples for generating faithful explanations, which leads to high economic costs. In this paper, we show that it is practical to generate faithful explanations for large-scale LLMs by sampling from some budget-friendly models through a series of empirical studies. Moreover, we show that such proxy explanations also perform well on downstream tasks. Our analysis provides a new paradigm of model-agnostic explanation methods for LLMs, by including information from budget-friendly models.

Problem

Research questions and friction points this paper is trying to address.

Generating cost-effective explanations for diverse large language models

Reducing economic costs of model-agnostic explanation techniques

Utilizing budget-friendly models to maintain explanation faithfulness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic explanation without internal parameters

Sampling from budget-friendly models for explanations

Proxy explanations effective for downstream tasks

🔎 Similar Papers

Evaluating the Reliability of Self-Explanations in Large Language Models