🤖 AI Summary
Clinical NLP classification in hospital settings suffers from data scarcity and domain mismatch, hindering effective adaptation of large language models (LLMs). Method: We systematically evaluate fine-tuning efficacy of biomedical LLMs—including CamemBERT-bio, AliBERT, and DrBERT—under low-resource conditions, comparing LoRA-based adapters (Adapter, TinyAttention, Lightweight, GRN) against training lightweight Transformer models from scratch. Contribution/Results: Training compact Transformers from scratch significantly outperforms adapter-based fine-tuning across F1 score (0.88 vs. lower baselines), training efficiency (<6 hours vs. >1000 hours), and cross-task generalization. Notably, Gated Residual Network (GRN) emerges as the top-performing adapter architecture. This work establishes a novel paradigm for low-resource clinical NLP: “training small models from scratch” can surpass “adapting large models via parameter-efficient tuning,” offering a reproducible, cost-effective pathway for deploying medical AI.
📝 Abstract
Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability. This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA), for fine-tuning LLMs in a resource-constrained hospital environment. We experimented with four structures-Adapter, Lightweight, TinyAttention, and Gated Residual Network (GRN)-as final layers for clinical notes classification. We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models. Our extensive experimental results indicate that i) employing adapter structures does not yield significant improvements in fine-tuning biomedical pre-trained LLMs, and ii) simpler Transformer-based models, trained from scratch, perform better under resource constraints. Among the adapter structures, GRN demonstrated superior performance with accuracy, precision, recall, and an F1 score of 0.88. Moreover, the total training time for LLMs exceeded 1000 hours, compared to under 6 hours for simpler transformer-based models, highlighting that LLMs are more suitable for environments with extensive computational resources and larger datasets. Consequently, this study demonstrates that simpler Transformer-based models can be effectively trained from scratch, providing a viable solution for clinical NLP tasks in low-resource environments with limited data availability. By identifying the GRN as the most effective adapter structure, we offer a practical approach to enhance clinical note classification without requiring extensive computational resources.