Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the challenge that current large language models (LLMs) struggle to simultaneously maintain semantic fidelity, readability, and empathetic resonance in clinical communication, falling short of established medical standards. The authors propose a collaborative rewriting framework that integrates semantic similarity analysis, Flesch-Kincaid Grade Level (FKGL) assessment, sentiment polarity quantification, and dual-perspective evaluation from both clinicians and patients. Systematic comparisons between general-purpose and healthcare-specific LLMs reveal that the proposed mechanism preserves high semantic similarity (mean 0.93) while significantly improving readability—evidenced by a 6.87-point FKGL reduction in GPT-5 outputs following empathetic prompting. Patients consistently rated rewritten versions as clearer and more emotionally attuned. The findings support LLMs as effective collaborative tools for augmenting, rather than replacing, physicians, though they still lag behind human experts in cognitive accuracy.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly deployed in healthcare, yet their communicative alignment with clinical standards remains insufficiently quantified. We conduct a multidimensional evaluation of general-purpose and domain-specialized LLMs across structured medical explanations and real-world physician-patient interactions, analyzing semantic fidelity, readability, and affective resonance. Baseline models amplify affective polarity relative to physicians (Very Negative: 43.14-45.10% vs. 37.25%) and, in larger architectures such as GPT-5 and Claude, produce substantially higher linguistic complexity (FKGL up to 16.91-17.60 vs. 11.47-12.50 in physician-authored responses). Empathy-oriented prompting reduces extreme negativity and lowers grade-level complexity (up to -6.87 FKGL points for GPT-5) but does not significantly increase semantic fidelity. Collaborative rewriting yields the strongest overall alignment. Rephrase configurations achieve the highest semantic similarity to physician answers (up to mean = 0.93) while consistently improving readability and reducing affective extremity. Dual stakeholder evaluation shows that no model surpasses physicians on epistemic criteria, whereas patients consistently prefer rewritten variants for clarity and emotional tone. These findings suggest that LLMs function most effectively as collaborative communication enhancers rather than replacements for clinical expertise.

Problem

Research questions and friction points this paper is trying to address.

clinical LLMs

empathy

readability

communicative alignment

semantic fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

collaborative rewriting

clinical alignment

empathy-oriented prompting