Every(bot) Makes Mistakes: Coding Big Five Personalities, Context, and Tone into an LLM Chatbot Recovery Code Framework

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study addresses the critical limitation of large language model (LLM) chatbots in lacking effective error recovery mechanisms, which often undermines user experience and trust. To bridge this gap, the authors propose the first three-stage error recovery framework that integrates task context, Big Five personality traits—specifically conscientiousness, agreeableness, openness, and extraversion—and conversational tone. They also develop a novel evaluation scale comprising three dimensions and nine sub-items. Leveraging Claude Sonnet 4.6, the framework is rigorously evaluated through controlled experiments and assessed by eight LLM-based evaluators. Results demonstrate a 27.8% overall improvement in recovery response quality—from 48.9% to 76.7%—with appropriateness reaching 83.3%. The proposed approach significantly outperforms baseline methods in both personality alignment and explanatory capability.
📝 Abstract
Despite careful design involving classifiers, parameters, and safeguarding, errors during human/AI interaction are not rare. Poor error recovery can disrupt interaction flow, damage user trust, and decrease user engagement. Whilst existing work has explored LLM recovery, tone, context, and personality as separate design dimensions, no existing work has combined these variables into a structured guidance framework. This paper presents a recovery code that maps four common LLM chatbot task contexts to associated personality traits (four Big Five personalities: Conscientiousness, Agreeableness, Openness, and Extraversion), tones, and three-stage recovery instructions. A recovery evaluation rubric was also designed, comprising three dimensions (Recovery quality, Tone alignment, and Appropriateness) and nine sub-dimensions. The methodology is exploratory, with no participants used. A between-subjects design was employed across two conditions: Condition A (baseline, uncoded), four separate Claude Sonnet 4.6 agents received no recovery code training; Condition B (coded), four separate Claude Sonnet 4.6 models were trained on the recovery code. Identical 'user' prompts and error scenarios were used across both conditions. Eight LLM evaluator agents assessed the recovery responses using the evaluation rubric, producing scores out of 5 for each sub-dimension. Results found a 27.8% average performance increase in coded recovery responses (76.7%) compared to baseline responses (48.9%). Condition B performed strongest in the appropriateness dimension (83.3%), with notable improvement in personality appropriateness (75% versus 50%) and providing explanation (60% versus 20%). These findings suggest that structured personality, context, and tone-informed recovery codes can be successfully learnt and applied by LLM chatbots to improve error recovery quality across varying contextual tasks.
Problem

Research questions and friction points this paper is trying to address.

error recovery
large language models
personality
context
tone
Innovation

Methods, ideas, or system contributions that make the work stand out.

recovery code
Big Five personality
LLM error recovery
context-aware response
tone alignment
🔎 Similar Papers
No similar papers found.