🤖 AI Summary
To address performance degradation of AI models for medical imaging in clinical deployment due to data drift, this paper proposes a large language model (LLM)-driven multi-agent collaborative framework enabling fully automated, programming-free continuous monitoring, evaluation, and adaptive fine-tuning. The framework comprises three specialized agents—monitoring, evaluation, and fine-tuning—orchestrated by an LLM acting as a central controller; they interact via natural language to close the optimization loop and support cross-modal imaging modalities (e.g., MRI, CT, X-ray). Its key innovation lies in unifying transfer learning and parameter-efficient fine-tuning within an LLM-mediated end-to-end autonomous maintenance pipeline. Experiments demonstrate that the method restores degraded model performance to within ±1.5% of its original level, achieving a maximum performance recovery of 41.1%, thereby significantly enhancing long-term clinical reliability.
📝 Abstract
Ensuring the long-term reliability of AI models in clinical practice requires continuous performance monitoring and corrective actions when degradation occurs. Addressing this need, this manuscript presents ReclAIm, a multi-agent framework capable of autonomously monitoring, evaluating, and fine-tuning medical image classification models. The system, built on a large language model core, operates entirely through natural language interaction, eliminating the need for programming expertise. ReclAIm successfully trains, evaluates, and maintains consistent performance of models across MRI, CT, and X-ray datasets. Once ReclAIm detects significant performance degradation, it autonomously executes state-of-the-art fine-tuning procedures that substantially reduce the performance gap. In cases with performance drops of up to -41.1% (MRI InceptionV3), ReclAIm managed to readjust performance metrics within 1.5% of the initial model results. ReclAIm enables automated, continuous maintenance of medical imaging AI models in a user-friendly and adaptable manner that facilitates broader adoption in both research and clinical environments.