Legal experts disagree with rationale extraction techniques for explaining ECtHR case outcome classification

📅 2026-01-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical gap between the explanations generated by state-of-the-art explainable AI methods for legal language models and the judgments of legal experts, which often exhibit significant discrepancies that undermine transparency and trustworthiness in legal applications. The authors propose a model-agnostic interpretability evaluation framework that extracts concise, human-understandable rationales from input texts and, for the first time, incorporates systematic manual assessments by legal experts on rationales produced by models classifying cases from the European Court of Human Rights (ECtHR). By integrating faithfulness metrics—such as normalized sufficiency and comprehensiveness—with expert-based reasonableness scores and exploring the viability of LLM-as-a-Judge as an automated proxy, the work reveals that despite strong performance on classification tasks and quantitative benchmarks, current interpretability methods yield rationales fundamentally misaligned with expert reasoning, highlighting their inadequacy for real-world legal practice.

Technology Category

Application Category

📝 Abstract
Interpretability is critical for applications of large language models in the legal domain which requires trust and transparency. While some studies develop task-specific approaches, other use the classification model's parameters to explain the decisions. However, which technique explains the legal outcome prediction best remains an open question. To address this challenge, we propose a comparative analysis framework for model-agnostic interpretability techniques. Among these, we employ two rationale extraction methods, which justify outcomes with human-interpretable and concise text fragments (i.e., rationales) from the given input text. We conduct comparison by evaluating faithfulness-via normalized sufficiency and comprehensiveness metrics along with plausibility-by asking legal experts to evaluate extracted rationales. We further assess the feasibility of LLM-as-a-Judge using legal expert evaluation results. We show that the model's"reasons"for predicting a violation differ substantially from those of legal experts, despite highly promising quantitative analysis results and reasonable downstream classification performance. The source code of our experiments is publicly available at https://github.com/trusthlt/IntEval.
Problem

Research questions and friction points this paper is trying to address.

interpretability
rationale extraction
legal AI
ECtHR
model explanation
Innovation

Methods, ideas, or system contributions that make the work stand out.

rationale extraction
model-agnostic interpretability
legal NLP
faithfulness evaluation
LLM-as-a-Judge
🔎 Similar Papers
No similar papers found.
M
Mahammad Namazov
Trustworthy Human Language Technologies, Research Center Trustworthy Data Science and Security of the University Alliance Ruhr, Ruhr University Bochum
T
Tomáš Koref
Center for Critical Computational Studies, Goethe University Frankfurt
Ivan Habernal
Ivan Habernal
Ruhr University Bochum
natural language processingprivacy-preserving NLPlegal NLPargumentation mining