Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

πŸ“… 2025-05-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work identifies and formalizes β€œself-consistent errors” in large language models (LLMs)β€”a critical yet overlooked class of errors wherein the model repeatedly generates the same incorrect answer across multiple stochastic samplings. Such errors evade detection by mainstream methods, and their prevalence does not diminish with increasing model scale. To address this, we propose a cross-model hidden-state probing method that fuses latent-layer evidence from external verification models to enhance detection robustness. We further introduce a multi-dimensional error-pattern statistical evaluation framework to systematically characterize self-consistent errors. Experiments across three representative LLM families demonstrate that our approach significantly improves detection accuracy, systematically overcoming four fundamental limitations inherent in existing methods for identifying self-consistent errors.

Technology Category

Application Category

πŸ“ Abstract
As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness. However, existing detection methods often overlook a critical problem we term as self-consistent error, where LLMs repeatly generate the same incorrect response across multiple stochastic samples. This work formally defines self-consistent errors and evaluates mainstream detection methods on them. Our investigation reveals two key findings: (1) Unlike inconsistent errors, whose frequency diminishes significantly as LLM scale increases, the frequency of self-consistent errors remains stable or even increases. (2) All four types of detection methshods significantly struggle to detect self-consistent errors. These findings reveal critical limitations in current detection methods and underscore the need for improved methods. Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross-model probe method that fuses hidden state evidence from an external verifier LLM. Our method significantly enhances performance on self-consistent errors across three LLM families.
Problem

Research questions and friction points this paper is trying to address.

LLMs generate plausible but incorrect content repeatedly
Current detection methods fail to identify self-consistent errors
Self-consistent errors persist or increase with larger LLM scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines self-consistent errors in LLMs
Proposes cross-model probe method
Fuses hidden state evidence
πŸ”Ž Similar Papers
No similar papers found.
Hexiang Tan
Hexiang Tan
phd student at Institute of Computing Technology, Chinese Academy of Sciences
Trustworthy of LLMs
F
Fei Sun
State Key Laboratory of AI Safety, Institute of Computing Technology, CAS
Sha Liu
Sha Liu
University of Chinese Academy of Sciences
Du Su
Du Su
Assistant Researcher, CAS Key Laboratory of AI Safety
AI safety
Q
Qi Cao
State Key Laboratory of AI Safety, Institute of Computing Technology, CAS
X
Xin Chen
Meituan
Jingang Wang
Jingang Wang
Meituan
Information RetrievalNatural Language ProcessingMachine Translation
X
Xunliang Cai
Meituan
Y
Yuanzhuo Wang
State Key Laboratory of AI Safety, Institute of Computing Technology, CAS
H
Huawei Shen
State Key Laboratory of AI Safety, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
Xueqi Cheng
Xueqi Cheng
Ph.D. student, Florida State University
Data miningLLMGNNComputational social science