Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Generative AI models suffer from limited output interpretability due to their black-box nature, posing trust and compliance risks—particularly in art and copyright-sensitive domains. To address this, we propose a search-driven data influence attribution method that reverse-traces the dependency of generated outputs on training data—including both raw samples and latent-space embeddings—enabling output-oriented interpretability analysis. Unlike conventional gradient- or perturbation-based approaches, our method innovatively anchors attribution at the generation outcome and unifies influence assessment across both original data and latent representations. It employs efficient search optimization coupled with local retraining for rigorous validation, enabling precise identification of critical training subsets. Experiments demonstrate strong cross-model generalization and significantly enhance the feasibility and reliability of expert-guided interpretability evaluation.

Technology Category

Application Category

📝 Abstract

Generative AI models offer powerful capabilities but often lack transparency, making it difficult to interpret their output. This is critical in cases involving artistic or copyrighted content. This work introduces a search-inspired approach to improve the interpretability of these models by analysing the influence of training data on their outputs. Our method provides observational interpretability by focusing on a model's output rather than on its internal state. We consider both raw data and latent-space embeddings when searching for the influence of data items in generated content. We evaluate our method by retraining models locally and by demonstrating the method's ability to uncover influential subsets in the training data. This work lays the groundwork for future extensions, including user-based evaluations with domain experts, which is expected to improve observational interpretability further.

Problem

Research questions and friction points this paper is trying to address.

Improving interpretability of Generative AI models

Analyzing training data influence on outputs

Enhancing transparency for artistic and copyrighted content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Search-based data influence analysis

Observational interpretability via output focus

Combines raw data and latent-space embeddings

🔎 Similar Papers

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models