COACH meets QUORUM: A Framework and Pipeline for Aligning User, Expert and Developer Perspectives in LLM-generated Health Counselling

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of designing health consultation systems for chronic disease patients that balance personalization, medical expertise, and development feasibility while reconciling diverse stakeholder evaluation criteria. The authors propose QUORUM, a multi-perspective evaluation framework, and COACH, a large language model–driven consultation generation pipeline that integrates user behavioral data with medical knowledge to deliver personalized lifestyle recommendations for cancer patients. Through a tripartite evaluation involving users, clinicians, and developers, the work establishes the first unified multi-stakeholder assessment framework, revealing consensus on relevance, quality, and reliability of advice, as well as divergences regarding tone, robustness to input errors, and hallucination risk. The approach is validated in the Healthy Chronos application, offering a generalizable pathway toward trustworthy, patient-centered health NLP systems.

Technology Category

Application Category

πŸ“ Abstract
Systems that collect data on sleep, mood, and activities can provide valuable lifestyle counselling to populations affected by chronic disease and its consequences. Such systems are, however, challenging to develop; besides reliably extracting patterns from user-specific data, systems should also contextualise these patterns with validated medical knowledge to ensure the quality of counselling, and generate counselling that is relevant to a real user. We present QUORUM, a new evaluation framework that unifies these developer-, expert-, and user-centric perspectives, and show with a real case study that it meaningfully tracks convergence and divergence in stakeholder perspectives. We also present COACH, a Large Language Model-driven pipeline to generate personalised lifestyle counselling for our Healthy Chronos use case, a diary app for cancer patients and survivors. Applying our framework shows that overall, users, medical experts, and developers converge on the opinion that the generated counselling is relevant, of good quality, and reliable. However, stakeholders also diverge on the tone of the counselling, sensitivity to errors in pattern-extraction, and potential hallucinations. These findings highlight the importance of multi-stakeholder evaluation for consumer health language technologies and illustrate how a unified evaluation framework can support trustworthy, patient-centered NLP systems in real-world settings.
Problem

Research questions and friction points this paper is trying to address.

LLM-generated health counselling
multi-stakeholder alignment
evaluation framework
patient-centered NLP
trustworthy AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-stakeholder evaluation
large language models
personalized health counselling
trustworthy NLP
health informatics
πŸ”Ž Similar Papers
No similar papers found.
Y
Yee Man Ng
Leiden University
Bram van Dijk
Bram van Dijk
Leiden University
Natural Language ProcessingHealth InformaticsPhilosophy
P
Pieter Beynen
Healthy Chronos
O
Otto Boekesteijn
Healthy Chronos
J
Joris Jansen
Healthy Chronos
G
Gerard van Oortmerssen
Leiden University
Max van Duijn
Max van Duijn
Assistant Professor, Leiden University
M
Marco Spruit
Leiden University