Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing text interpretability methods are constrained by reliance on manual concept annotation or yield implicit, human-incomprehensible concepts, hindering unsupervised and trustworthy concept discovery. To address this, we propose the first unsupervised, text-oriented explainable AI framework. Our method introduces an object-centric neural architecture to automatically disentangle semantic concepts from raw text, and leverages large language models (LLMs) as *interpretability discriminators*—assessing concept clarity and human readability—within a feedback-driven reinforcement fine-tuning loop that dynamically refines concept quality. Evaluated across diverse multi-task benchmarks, our approach significantly outperforms existing state-of-the-art methods. Both human evaluations and automated metrics confirm that the discovered concepts exhibit superior interpretability, consistency, and user trustworthiness—thereby overcoming the dual bottlenecks of *uncontrollability* and *untrustworthiness* in concept-based interpretability.

Technology Category

Application Category

📝 Abstract

Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce explanations that are not intuitively comprehensible to humans, potentially diminishing user trust. These methods fall short of discovering comprehensible concepts automatically. To address this issue, we propose extbf{ECO-Concept}, an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations. ECO-Concept first utilizes an object-centric architecture to extract semantic concepts automatically. Then the comprehensibility of the extracted concepts is evaluated by large language models. Finally, the evaluation result guides the subsequent model fine-tuning to obtain more understandable explanations. Experiments show that our method achieves superior performance across diverse tasks. Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised discovery of comprehensible text concepts

Overcoming reliance on predefined concept annotations

Improving human-understandable explanations in explainable AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised concept discovery without annotations

Object-centric architecture for semantic extraction

LLM-guided comprehensibility evaluation and tuning

🔎 Similar Papers

Self-supervised Interpretable Concept-based Models for Text Classification