The Impact of Image Resolution on Biomedical Multimodal Large Language Models

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Biomedical multimodal large language models (MLLMs) suffer from critical detail loss due to reliance on low-resolution general-domain data for pretraining. We systematically investigate the impact of image resolution on model performance and identify resolution mismatch between training and inference as a key factor degrading biomedical understanding. Method: We propose a native-resolution end-to-end training and inference paradigm, coupled with a hybrid-resolution training strategy: high-resolution samples are preserved at full fidelity while downsampled variants are introduced to balance information completeness and computational efficiency. Contribution/Results: Our native-resolution approach achieves an average 12.7% performance gain across diverse biomedical understanding tasks. The hybrid strategy attains 92% of native-resolution performance with only a 5% increase in training cost. These results provide a reproducible methodological foundation for high-fidelity biomedical MLLM development and clinical deployment.

Technology Category

Application Category

📝 Abstract

Imaging technologies are fundamental to biomedical research and modern medicine, requiring analysis of high-resolution images across various modalities. While multimodal large language models (MLLMs) show promise for biomedical image analysis, most are designed for low-resolution images from general-purpose datasets, risking critical information loss. We investigate how image resolution affects MLLM performance in biomedical applications and demonstrate that: (1) native-resolution training and inference significantly improve performance across multiple tasks, (2) misalignment between training and inference resolutions severely degrades performance, and (3) mixed-resolution training effectively mitigates misalignment and balances computational constraints with performance requirements. Based on these findings, we recommend prioritizing native-resolution inference and mixed-resolution datasets to optimize biomedical MLLMs for transformative impact in scientific research and clinical applications.

Problem

Research questions and friction points this paper is trying to address.

Investigates image resolution impact on biomedical multimodal models

Addresses information loss from low-resolution training in medical imaging

Proposes native-resolution inference and mixed-resolution training solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Native-resolution training and inference enhance performance

Mixed-resolution training mitigates resolution misalignment issues

Prioritizing native-resolution inference optimizes biomedical MLLMs

🔎 Similar Papers

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs