Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the faithfulness of self-explanations generated by large language models (LLMs) for sentiment classification in low-resource languages—specifically Persian—where ground-truth reasoning processes are scarce and difficult to model. Method: We propose two prompting strategies—“predict-then-explain” and “explain-then-predict”—and introduce a token-level log-probability-based confidence score to quantify explanation reliability. Faithfulness is rigorously evaluated via human annotation assessing alignment between model-generated explanations and actual human reasoning. Contribution/Results: Despite high classification accuracy, LLMs exhibit significantly low agreement with human annotations; inter-model explanation similarity substantially exceeds model–human alignment, revealing systematic faithfulness deficits in self-explanation for low-resource settings. To our knowledge, this is the first work to empirically quantify and validate the unreliability of LLM self-explanations in Persian, establishing a methodological benchmark and issuing an evidence-based caution for explainable AI in low-resource languages.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly used to generate self-explanations alongside their predictions, a practice that raises concerns about the faithfulness of these explanations, especially in low-resource languages. This study evaluates the faithfulness of LLM-generated explanations in the context of emotion classification in Persian, a low-resource language, by comparing the influential words identified by the model against those identified by human annotators. We assess faithfulness using confidence scores derived from token-level log-probabilities. Two prompting strategies, differing in the order of explanation and prediction (Predict-then-Explain and Explain-then-Predict), are tested for their impact on explanation faithfulness. Our results reveal that while LLMs achieve strong classification performance, their generated explanations often diverge from faithful reasoning, showing greater agreement with each other than with human judgments. These results highlight the limitations of current explanation methods and metrics, emphasizing the need for more robust approaches to ensure LLM reliability in multilingual and low-resource contexts.
Problem

Research questions and friction points this paper is trying to address.

Evaluating faithfulness of LLM-generated explanations in Persian emotion detection
Comparing model-identified influential words against human annotations
Assessing impact of explanation-prediction order on faithfulness metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates faithfulness using token-level confidence scores
Compares model explanations with human annotations in Persian
Tests two prompting strategies for explanation generation
🔎 Similar Papers
No similar papers found.
M
Mobina Mehrazar
School of Electrical and Computer Engineering College of Engineering, University of Tehran, Tehran, Iran
M
Mohammad Amin Yousefi
School of Electrical and Computer Engineering College of Engineering, University of Tehran, Tehran, Iran
P
Parisa Abolfath Beygi
Computer Science Department, University of British Columbia, Vancouver, Canada
Behnam Bahrak
Behnam Bahrak
Tehran Institute for Advanced Studies