Chatbot To Help Patients Understand Their Health

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of low patient health literacy impeding autonomous decision-making by proposing a “Learning-as-Dialogue” framework and developing NoteAid-Chatbot—a lightweight, dialogue-based AI system. Built upon the LLaMA-3.2-3B foundation model, it employs medical-strategy-guided synthetic data for supervised fine-tuning and integrates patient comprehension assessment—simulated in discharge scenarios—as a reinforcement learning reward signal, optimized via the PPO algorithm to enable fully unsupervised, self-driven training. Its key contribution lies in autonomously eliciting clear, relevant, and structurally pedagogical behaviors across multi-turn interactions—without any human annotation. Human alignment evaluation and Turing testing demonstrate that NoteAid-Chatbot outperforms untrained human interlocutors, validating the feasibility of efficient alignment for small-language models in open-domain health education dialogues.

Technology Category

Application Category

📝 Abstract
Patients must possess the knowledge necessary to actively participate in their care. We present NoteAid-Chatbot, a conversational AI that promotes patient understanding via a novel 'learning as conversation' framework, built on a multi-agent large language model (LLM) and reinforcement learning (RL) setup without human-labeled data. NoteAid-Chatbot was built on a lightweight LLaMA 3.2 3B model trained in two stages: initial supervised fine-tuning on conversational data synthetically generated using medical conversation strategies, followed by RL with rewards derived from patient understanding assessments in simulated hospital discharge scenarios. Our evaluation, which includes comprehensive human-aligned assessments and case studies, demonstrates that NoteAid-Chatbot exhibits key emergent behaviors critical for patient education, such as clarity, relevance, and structured dialogue, even though it received no explicit supervision for these attributes. Our results show that even simple Proximal Policy Optimization (PPO)-based reward modeling can successfully train lightweight, domain-specific chatbots to handle multi-turn interactions, incorporate diverse educational strategies, and meet nuanced communication objectives. Our Turing test demonstrates that NoteAid-Chatbot surpasses non-expert human. Although our current focus is on healthcare, the framework we present illustrates the feasibility and promise of applying low-cost, PPO-based RL to realistic, open-ended conversational domains, broadening the applicability of RL-based alignment methods.
Problem

Research questions and friction points this paper is trying to address.

Enhancing patient health literacy through AI conversations
Developing medical chatbots without human-labeled training data
Training lightweight LLMs for multi-turn healthcare dialogues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM with reinforcement learning
Synthetic medical conversation training data
PPO-based reward modeling without human labels
🔎 Similar Papers
No similar papers found.
Won Seok Jang
Won Seok Jang
PhD student, University of Massachusetts Lowell
natural language processinghealthcare
Hieu Tran
Hieu Tran
University of Maryland, College Park
Natural Language ProcessingLarge Language Models
M
Manav Mistry
Center for Healthcare Organization and Implementation Research, VA Bedford Health Care; Miner School of Computer and Information Sciences, University of Massachusetts Lowell
S
SaiKiran Gandluri
Center for Healthcare Organization and Implementation Research, VA Bedford Health Care; Miner School of Computer and Information Sciences, University of Massachusetts Lowell
Y
Yifan Zhang
Center for Healthcare Organization and Implementation Research, VA Bedford Health Care; Miner School of Computer and Information Sciences, University of Massachusetts Lowell
S
Sharmin Sultana
Center for Healthcare Organization and Implementation Research, VA Bedford Health Care; Miner School of Computer and Information Sciences, University of Massachusetts Lowell
Sunjae Kwon
Sunjae Kwon
Umass Amherst
Machine LearningNatural Language ProcessingLexical SemanticsPublic HealthAi in Healthcare
Y
Yuan Zhang
Miner School of Computer and Information Sciences, University of Massachusetts Lowell
Zonghai Yao
Zonghai Yao
Umass Amherst
Medical-LLMMulti-agent AI HospitalClinical ReasoningSynthetic DataPatient Education
H
Hong Yu
Center for Healthcare Organization and Implementation Research, VA Bedford Health Care; Miner School of Computer and Information Sciences, University of Massachusetts Lowell; Manning College of Information and Computer Sciences, University of Massachusetts Amherst; Department of Medicine, University of Massachusetts Medical School