🤖 AI Summary
This study addresses key challenges in applying large language models (LLMs) to medical text classification—including high annotation costs, class imbalance, privacy sensitivity, domain-specific terminology, and limited scalability. Following the PRISMA framework, we systematically reviewed 65 peer-reviewed studies (2018–2024) sourced from PubMed, Scopus, and Google Scholar, introducing a novel four-dimensional analytical taxonomy based on classification type, clinical scenario, text source, and evaluation metric. Results demonstrate that LLMs significantly outperform traditional models under few-shot and zero-shot settings. However, methodological biases were identified in the reporting of F1-score and AUC, often lacking standardization across tasks and datasets. Critical gaps include insufficient model interpretability, inadequate domain-aligned fine-tuning, and absence of privacy-preserving computation techniques. We propose three priority research directions: (1) interpretable modeling for clinical decision support, (2) domain-adaptive fine-tuning strategies, and (3) privacy-enhancing computational frameworks compliant with healthcare regulations.
📝 Abstract
Large Language Models (LLMs) have fundamentally transformed approaches to Natural Language Processing (NLP) tasks across diverse domains. In healthcare, accurate and cost-efficient text classification is crucial, whether for clinical notes analysis, diagnosis coding, or any other task, and LLMs present promising potential. Text classification has always faced multiple challenges, including manual annotation for training, handling imbalanced data, and developing scalable approaches. With healthcare, additional challenges are added, particularly the critical need to preserve patients' data privacy and the complexity of the medical terminology. Numerous studies have been conducted to leverage LLMs for automated healthcare text classification and contrast the results with existing machine learning-based methods where embedding, annotation, and training are traditionally required. Existing systematic reviews about LLMs either do not specialize in text classification or do not focus on the healthcare domain. This research synthesizes and critically evaluates the current evidence found in the literature regarding the use of LLMs for text classification in a healthcare setting. Major databases (e.g., Google Scholar, Scopus, PubMed, Science Direct) and other resources were queried, which focused on the papers published between 2018 and 2024 within the framework of PRISMA guidelines, which resulted in 65 eligible research articles. These were categorized by text classification type (e.g., binary classification, multi-label classification), application (e.g., clinical decision support, public health and opinion analysis), methodology, type of healthcare text, and metrics used for evaluation and validation. This review reveals the existing gaps in the literature and suggests future research lines that can be investigated and explored.