π€ AI Summary
Mild traumatic brain injury (mTBI) poses significant clinical challenges due to its radiologically subtle manifestations and low diagnostic accuracy. Method: This study introduces the first intelligent diagnostic system integrating fine-tuned multimodal vision-language models (e.g., ViLT, BLIP-2) with the OpenAI-o3 reasoning large language model (LLM). We propose an LLM Agentβdriven, interpretable decision framework leveraging customized prompt engineering, multi-model consensus aggregation, and dynamic task orchestration to enable end-to-end MRI analysis and diagnostic reasoning. Contribution/Results: To our knowledge, this is the first work synergizing a coalition of vision-language models with a reasoning-oriented LLM for mTBI prediction, establishing a clinically interpretable, automated diagnostic paradigm. Evaluated on real-world MRI data from military physicians, the system significantly improves robustness in detecting occult lesions and enhances diagnostic confidence. A prototype has been successfully deployed with the U.S. Army Medical Research Team.
π Abstract
Mild Traumatic Brain Injury (TBI) detection presents significant challenges due to the subtle and often ambiguous presentation of symptoms in medical imaging, making accurate diagnosis a complex task. To address these challenges, we propose Proof-of-TBI, a medical diagnosis support system that integrates multiple fine-tuned vision-language models with the OpenAI-o3 reasoning large language model (LLM). Our approach fine-tunes multiple vision-language models using a labeled dataset of TBI MRI scans, training them to diagnose TBI symptoms effectively. The predictions from these models are aggregated through a consensus-based decision-making process. The system evaluates the predictions from all fine-tuned vision language models using the OpenAI-o3 reasoning LLM, a model that has demonstrated remarkable reasoning performance, to produce the most accurate final diagnosis. The LLM Agents orchestrates interactions between the vision-language models and the reasoning LLM, managing the final decision-making process with transparency, reliability, and automation. This end-to-end decision-making workflow combines the vision-language model consortium with the OpenAI-o3 reasoning LLM, enabled by custom prompt engineering by the LLM agents. The prototype for the proposed platform was developed in collaboration with the U.S. Army Medical Research team in Newport News, Virginia, incorporating five fine-tuned vision-language models. The results demonstrate the transformative potential of combining fine-tuned vision-language model inputs with the OpenAI-o3 reasoning LLM to create a robust, secure, and highly accurate diagnostic system for mild TBI prediction. To the best of our knowledge, this research represents the first application of fine-tuned vision-language models integrated with a reasoning LLM for TBI prediction tasks.