Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of systematic, cross-task evaluation frameworks hinders informed selection of large language models (LLMs) for biomedical applications. Method: We conduct a unified, standardized benchmarking study across 20+ open- and closed-source LLMs on biomedical text classification, generation, question answering, and multimodal image understanding tasks, measuring both performance and computational cost. Contribution/Results: We find no universally optimal model across tasks; lightweight open-source models—including BioMedLM and PubMedGPT—match or surpass proprietary models (e.g., GPT-4, Claude) on specific benchmarks while offering faster inference, flexible deployment, and enhanced data privacy. This is the first empirical demonstration of strong task dependency in biomedical LLM performance. We propose a “task-driven model selection” paradigm and release a reproducible evaluation benchmark with practical guidelines, enabling resource-efficient deployment in clinical decision support and biomedical research.

Technology Category

Application Category

📝 Abstract
This paper presents a comprehensive evaluation of cost-efficient Large Language Models (LLMs) for diverse biomedical tasks spanning both text and image modalities. We evaluated a range of closed-source and open-source LLMs on tasks such as biomedical text classification and generation, question answering, and multimodal image processing. Our experimental findings indicate that there is no single LLM that can consistently outperform others across all tasks. Instead, different LLMs excel in different tasks. While some closed-source LLMs demonstrate strong performance on specific tasks, their open-source counterparts achieve comparable results (sometimes even better), with additional benefits like faster inference and enhanced privacy. Our experimental results offer valuable insights for selecting models that are optimally suited for specific biomedical applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cost-efficient LLMs in biomedical tasks
Comparing closed-source and open-source LLMs' performance
Identifying optimal LLMs for specific biomedical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates cost-efficient LLMs for biomedical tasks
Compares closed-source and open-source LLMs performance
Recommends optimal models for specific biomedical applications
🔎 Similar Papers
No similar papers found.