Large Language Models for Healthcare Text Classification: A Systematic Review

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key challenges in applying large language models (LLMs) to medical text classification—including high annotation costs, class imbalance, privacy sensitivity, domain-specific terminology, and limited scalability. Following the PRISMA framework, we systematically reviewed 65 peer-reviewed studies (2018–2024) sourced from PubMed, Scopus, and Google Scholar, introducing a novel four-dimensional analytical taxonomy based on classification type, clinical scenario, text source, and evaluation metric. Results demonstrate that LLMs significantly outperform traditional models under few-shot and zero-shot settings. However, methodological biases were identified in the reporting of F1-score and AUC, often lacking standardization across tasks and datasets. Critical gaps include insufficient model interpretability, inadequate domain-aligned fine-tuning, and absence of privacy-preserving computation techniques. We propose three priority research directions: (1) interpretable modeling for clinical decision support, (2) domain-adaptive fine-tuning strategies, and (3) privacy-enhancing computational frameworks compliant with healthcare regulations.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have fundamentally transformed approaches to Natural Language Processing (NLP) tasks across diverse domains. In healthcare, accurate and cost-efficient text classification is crucial, whether for clinical notes analysis, diagnosis coding, or any other task, and LLMs present promising potential. Text classification has always faced multiple challenges, including manual annotation for training, handling imbalanced data, and developing scalable approaches. With healthcare, additional challenges are added, particularly the critical need to preserve patients' data privacy and the complexity of the medical terminology. Numerous studies have been conducted to leverage LLMs for automated healthcare text classification and contrast the results with existing machine learning-based methods where embedding, annotation, and training are traditionally required. Existing systematic reviews about LLMs either do not specialize in text classification or do not focus on the healthcare domain. This research synthesizes and critically evaluates the current evidence found in the literature regarding the use of LLMs for text classification in a healthcare setting. Major databases (e.g., Google Scholar, Scopus, PubMed, Science Direct) and other resources were queried, which focused on the papers published between 2018 and 2024 within the framework of PRISMA guidelines, which resulted in 65 eligible research articles. These were categorized by text classification type (e.g., binary classification, multi-label classification), application (e.g., clinical decision support, public health and opinion analysis), methodology, type of healthcare text, and metrics used for evaluation and validation. This review reveals the existing gaps in the literature and suggests future research lines that can be investigated and explored.
Problem

Research questions and friction points this paper is trying to address.

LLMs improve healthcare text classification accuracy and efficiency.
Address challenges like data privacy and medical terminology complexity.
Systematic review identifies gaps in LLM healthcare text classification research.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for healthcare text classification
Addressing data privacy and medical terminology challenges
Systematic review of LLMs in healthcare applications
🔎 Similar Papers
No similar papers found.
Hajar Sakai
Hajar Sakai
Ph.D. in Industrial and Systems Engineering
Large Language ModelsText ClassificationTime Series Forecasting
S
Sarah S. Lam
School of Systems Science and Industrial Engineering, State University of University at Binghamton, Binghamton, NY, USA