🤖 AI Summary
Large language models (LLMs) face prohibitive computational costs, deployment challenges, and privacy risks in resource-constrained healthcare settings. Method: This work investigates the practical deployment of small language models (SLMs) for medical image classification, proposing two novel prompting strategies—incremental summarization and error-correcting reflection—applied to the NIH chest X-ray dataset for AP/PA view binary classification. Contribution/Results: Optimized SLMs (e.g., Phi-3, Qwen2) achieve 92.7% zero-shot accuracy—comparable to GPT-4o (94.1%) and substantially outperform baseline instruction prompting—without fine-tuning or domain-specific AI expertise. Inference overhead is reduced by two orders of magnitude. This study provides the first empirical validation that lightweight prompt engineering can bridge the performance gap between SLMs and LLMs in medical vision tasks, establishing a low-barrier, privacy-preserving paradigm for AI adoption in primary care.
📝 Abstract
Large language models (LLMs) have shown remarkable capabilities in natural language processing and multi-modal understanding. However, their high computational cost, limited accessibility, and data privacy concerns hinder their adoption in resource-constrained healthcare environments. This study investigates the performance of small language models (SLMs) in a medical imaging classification task, comparing different models and prompt designs to identify the optimal combination for accuracy and usability. Using the NIH Chest X-ray dataset, we evaluate multiple SLMs on the task of classifying chest X-ray positions (anteroposterior [AP] vs. posteroanterior [PA]) under three prompt strategies: baseline instruction, incremental summary prompts, and correction-based reflective prompts. Our results show that certain SLMs achieve competitive accuracy with well-crafted prompts, suggesting that prompt engineering can substantially enhance SLM performance in healthcare applications without requiring deep AI expertise from end users.