๐ค AI Summary
Quantitative XAI evaluation metrics (e.g., fidelity, stability) are misaligned with user-centered qualitative requirements (e.g., comprehensibility, satisfaction), and there is no data-driven guidance for jointly selecting appropriate AI models and XAI methods. Method: We propose the first hybrid evaluation framework integrating quantitative benchmarks with LLM-generated virtual user personas. It introduces multi-dimensional GPT-based personas for subjective interpretability assessment; designs a content-aware datasetโmodelโXAI triadic matching and effectiveness estimation mechanism; and unifies collaborative filtering recommendation, XAI metric computation, and semantic satisfaction analysis. Contribution/Results: Evaluated across multiple benchmarks, our framework improves recommendation accuracy by 32%, achieves strong agreement between virtual persona assessments and real-user studies (Spearman ฯ = 0.89), and enables end-to-end automated selection of XAI solutions alongside pre-deployment explanation quality prediction.
๐ Abstract
In today's data-driven era, computational systems generate vast amounts of data that drive the digital transformation of industries, where Artificial Intelligence (AI) plays a key role. Currently, the demand for eXplainable AI (XAI) has increased to enhance the interpretability, transparency, and trustworthiness of AI models. However, evaluating XAI methods remains challenging: existing evaluation frameworks typically focus on quantitative properties such as fidelity, consistency, and stability without taking into account qualitative characteristics such as satisfaction and interpretability. In addition, practitioners face a lack of guidance in selecting appropriate datasets, AI models, and XAI methods -a major hurdle in human-AI collaboration. To address these gaps, we propose a framework that integrates quantitative benchmarking with qualitative user assessments through virtual personas based on the"Anthology"of backstories of the Large Language Model (LLM). Our framework also incorporates a content-based recommender system that leverages dataset-specific characteristics to match new input data with a repository of benchmarked datasets. This yields an estimated XAI score and provides tailored recommendations for both the optimal AI model and the XAI method for a given scenario.