Teaching People LLM's Errors and Getting it Right

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Users often over-rely on large language models (LLMs) in simple tasks (e.g., arithmetic) due to their strong performance on complex ones (e.g., poetry generation), misjudging reliability. Existing methods—using embedding clustering to identify LLM failure modes and teach users—show limited effectiveness. Method: We conduct the first empirical validation of the groupability and teachability of systematic LLM failure patterns. Introducing a novel paradigm for instructional efficacy—user accuracy in *anticipating* LLM errors—we replace traditional human-AI collaboration accuracy metrics. Using meta-label grouping, embedding clustering, prompt engineering, and controlled user studies, we evaluate current automated failure detection and instruction approaches. Contribution/Results: We find that state-of-the-art automatic failure discovery lacks stability; critically, our new teaching paradigm significantly improves users’ error anticipation accuracy (p < 0.01), providing both theoretical grounding and practical pathways for reliable human-LLM collaboration.

Technology Category

Application Category

📝 Abstract

People use large language models (LLMs) when they should not. This is partly because they see LLMs compose poems and answer intricate questions, so they understandably, but incorrectly, assume LLMs won't stumble on basic tasks like simple arithmetic. Prior work has tried to address this by clustering instance embeddings into regions where an LLM is likely to fail and automatically describing patterns in these regions. The found failure patterns are taught to users to mitigate their overreliance. Yet, this approach has not fully succeeded. In this analysis paper, we aim to understand why. We first examine whether the negative result stems from the absence of failure patterns. We group instances in two datasets by their meta-labels and evaluate an LLM's predictions on these groups. We then define criteria to flag groups that are sizable and where the LLM is error-prone, and find meta-label groups that meet these criteria. Their meta-labels are the LLM's failure patterns that could be taught to users, so they do exist. We next test whether prompting and embedding-based approaches can surface these known failures. Without this, users cannot be taught about them to reduce their overreliance. We find mixed results across methods, which could explain the negative result. Finally, we revisit the final metric that measures teaching effectiveness. We propose to assess a user's ability to effectively use the given failure patterns to anticipate when an LLM is error-prone. A user study shows a positive effect from teaching with this metric, unlike the human-AI team accuracy. Our findings show that teaching failure patterns could be a viable approach to mitigating overreliance, but success depends on better automated failure-discovery methods and using metrics like ours.

Problem

Research questions and friction points this paper is trying to address.

Investigates why teaching LLM failure patterns fails to reduce user overreliance.

Tests if automated methods can effectively surface LLM failure patterns for users.

Proposes a new metric to assess teaching effectiveness in mitigating overreliance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Teaching LLM failure patterns to users

Using meta-label groups to identify error-prone instances

Proposing new metrics for teaching effectiveness assessment

🔎 Similar Papers

No similar papers found.