🤖 AI Summary
Biomedical multimodal large language models (MLLMs) suffer from critical detail loss due to reliance on low-resolution general-domain data for pretraining. We systematically investigate the impact of image resolution on model performance and identify resolution mismatch between training and inference as a key factor degrading biomedical understanding.
Method: We propose a native-resolution end-to-end training and inference paradigm, coupled with a hybrid-resolution training strategy: high-resolution samples are preserved at full fidelity while downsampled variants are introduced to balance information completeness and computational efficiency.
Contribution/Results: Our native-resolution approach achieves an average 12.7% performance gain across diverse biomedical understanding tasks. The hybrid strategy attains 92% of native-resolution performance with only a 5% increase in training cost. These results provide a reproducible methodological foundation for high-fidelity biomedical MLLM development and clinical deployment.
📝 Abstract
Imaging technologies are fundamental to biomedical research and modern medicine, requiring analysis of high-resolution images across various modalities. While multimodal large language models (MLLMs) show promise for biomedical image analysis, most are designed for low-resolution images from general-purpose datasets, risking critical information loss. We investigate how image resolution affects MLLM performance in biomedical applications and demonstrate that: (1) native-resolution training and inference significantly improve performance across multiple tasks, (2) misalignment between training and inference resolutions severely degrades performance, and (3) mixed-resolution training effectively mitigates misalignment and balances computational constraints with performance requirements. Based on these findings, we recommend prioritizing native-resolution inference and mixed-resolution datasets to optimize biomedical MLLMs for transformative impact in scientific research and clinical applications.