Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing LLM evaluation methods lack fine-grained error attribution capabilities, hindering interpretable model diagnosis and iterative improvement. Method: We propose MisAttributionLLM—the first general discriminative model supporting holistic scoring, error cause classification, and natural language feedback generation. We design a comprehensive error attribution framework covering six primary causes and fifteen fine-grained sub-causes, and release AttriData, the first large-scale, human-annotated dataset dedicated to error attribution. Contribution/Results: Fine-tuned on AttriData, MisAttributionLLM enables multi-dimensional analysis of LLM output errors. Experiments demonstrate significant improvements over baselines in scoring consistency, error cause classification accuracy, and feedback quality, with strong robustness across diverse tasks and models. This work pioneers systematic error attribution in LLM evaluation, establishing an interpretable and actionable technical foundation for model debugging and optimization.

Technology Category

Application Category

📝 Abstract

With the widespread application of Large Language Models (LLMs) in various tasks, the mainstream LLM platforms generate massive user-model interactions daily. In order to efficiently analyze the performance of models and diagnose failures in their answers, it is essential to develop an automated framework to systematically categorize and attribute errors. However, existing evaluation models lack error attribution capability. In this work, we establish a comprehensive Misattribution Framework with 6 primary and 15 secondary categories to facilitate in-depth analysis. Based on this framework, we present AttriData, a dataset specifically designed for error attribution, encompassing misattribution, along with the corresponding scores and feedback. We also propose MisAttributionLLM, a fine-tuned model on AttriData, which is the first general-purpose judge model capable of simultaneously generating score, misattribution, and feedback. Extensive experiments and analyses are conducted to confirm the effectiveness and robustness of our proposed method.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing failures in LLM answers systematically

Lack of error attribution in existing evaluation models

Need automated framework for error categorization and analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops automated error attribution framework

Introduces AttriData for misattribution analysis

Proposes fine-tuned MisAttributionLLM judge model

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions

2024-08-19arXiv.orgCitations: 0

Roblox

$185,860—$221,380 USD

San Mateo, CA, USA

Authors to Follow