🤖 AI Summary
Clarifying the capability boundaries and complementary roles of generative large language models (LLMs) versus traditional NLP methods in medical AI applications remains an open challenge. Method: We conducted a large-scale bibliometric analysis of 19,123 publications, systematically categorized medical tasks, and performed cross-paradigm performance benchmarking across diverse clinical NLP tasks. Contribution/Results: Our study provides the first large-scale empirical evidence that generative LLMs significantly outperform traditional methods in open-ended generative tasks—such as clinical question answering and medical report generation—while maintaining robust safety and controllability. Conversely, conventional NLP approaches retain substantial advantages in structured information extraction tasks—including named entity recognition and relation extraction—demonstrating superior accuracy and interpretability. Based on these findings, we propose a task-aware technology selection framework tailored to medical domain characteristics, offering both theoretical foundations and practical guidelines for ethically grounded, efficient, and context-appropriate deployment of AI in healthcare.
📝 Abstract
Natural language processing (NLP) has been traditionally applied to medicine, and generative large language models (LLMs) have become prominent recently. However, the differences between them across different medical tasks remain underexplored. We analyzed 19,123 studies, finding that generative LLMs demonstrate advantages in open-ended tasks, while traditional NLP dominates in information extraction and analysis tasks. As these technologies advance, ethical use of them is essential to ensure their potential in medical applications.