Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the unclear mechanisms by which large language models (LLMs) respond to user repair behaviors in multi-turn dialogues, a gap that undermines interactional reliability. Drawing on conversation-analytic theories of repair, the authors introduce a systematic framework into human–AI dialogue research and design controlled multi-turn experiments featuring solvable and unsolvable mathematical problems. Combining qualitative and quantitative methods, they evaluate mainstream LLMs’ sensitivity and consistency in handling repairs. The findings reveal substantial inter-model differences: some models resist valid user corrections, while others are easily misled; furthermore, their multi-turn behavioral patterns prove idiosyncratic and difficult to predict. These results highlight critical limitations in current LLMs’ capacity for dynamic dialogue understanding and adaptive response generation.

Technology Category

Application Category

📝 Abstract

Repair, an important resource for resolving trouble in human-human conversation, remains underexplored in human-LLM interaction. In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range from being almost completely resistant to (appropriate) repair attempts to being highly susceptible and easily manipulated. We further demonstrate that once conversations extend beyond a single turn, model behavior becomes more distinctive and less predictable across systems. Overall, our findings indicate that each tested LLM exhibits its own characteristic form of unreliability in the context of repair.

Problem

Research questions and friction points this paper is trying to address.

repair

multi-turn dialogue

large language models

unreliability

human-LLM interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

repair

multi-turn dialogue

LLM reliability