Large Language Models for Patient Comments Multi-Label Classification

📅 2024-10-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of analyzing unstructured inpatient feedback for multidimensional satisfaction and care quality assessment while preserving patient privacy. We propose a privacy-preserving multi-label text classification framework that integrates GPT-4 Turbo with a clinical Protected Health Information (PHI) detection module, leveraging zero-shot learning, in-context learning, and chain-of-thought prompting to enable end-to-end classification without labeled medical feedback. Experimental results demonstrate an F1-score of 76.12% and a weighted F1-score of 73.61%, significantly outperforming conventional and pretrained language models; classification outputs exhibit strong correlation (r > 0.75) with structured patient satisfaction scores. Our key contribution is the first integration of large language models with clinical privacy safeguards for healthcare experience analysis—demonstrating efficacy, interpretability, and regulatory compliance in low-resource, high-stakes clinical settings.

Technology Category

Application Category

📝 Abstract
Patient experience and care quality are crucial for a hospital's sustainability and reputation. The analysis of patient feedback offers valuable insight into patient satisfaction and outcomes. However, the unstructured nature of these comments poses challenges for traditional machine learning methods following a supervised learning paradigm. This is due to the unavailability of labeled data and the nuances these texts encompass. This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments shared after a stay in the hospital. GPT-4 Turbo was leveraged to conduct the classification. However, given the sensitive nature of patients' comments, a security layer is introduced before feeding the data to the LLM through a Protected Health Information (PHI) detection framework, which ensures patients' de-identification. Additionally, using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with. Results demonstrate that GPT-4 Turbo, whether following a zero-shot or few-shot setting, outperforms traditional methods and Pre-trained Language Models (PLMs) and achieves the highest overall performance with an F1-score of 76.12% and a weighted F1-score of 73.61% followed closely by the few-shot learning results. Subsequently, the results' association with other patient experience structured variables (e.g., rating) was conducted. The study enhances MLTC through the application of LLMs, offering healthcare practitioners an efficient method to gain deeper insights into patient feedback and deliver prompt, appropriate responses.
Problem

Research questions and friction points this paper is trying to address.

Classify patient comments using Large Language Models.
Address challenges of unstructured patient feedback analysis.
Enhance healthcare feedback insights with advanced AI techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes GPT-4 Turbo for classification
Implements PHI detection for security
Applies prompt engineering techniques
🔎 Similar Papers
No similar papers found.
Hajar Sakai
Hajar Sakai
Ph.D. in Industrial and Systems Engineering
Large Language ModelsText ClassificationTime Series Forecasting
S
Sarah S. Lam
School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY, USA
Mohammadsadegh Mikaeili
Mohammadsadegh Mikaeili
PhD. Industrial and Systems Engineering, Binghamton University, Cooper University Healthcare
Machine Learning in Healthcare
J
Joshua Bosire
Cooper University Health Care, Camden, NJ, USA
F
Franziska Jovin
Cooper University Health Care, Camden, NJ, USA