🤖 AI Summary
Current pathological vision-language models exhibit limitations in rigorous reasoning-path-based diagnosis and multi-task generalization. This paper proposes a multimodal pathological AI system integrating histopathological images, clinical text, and transcriptomic data, leveraging reinforcement learning and dynamic routing mechanisms to support disease diagnosis, fine-grained image interpretation, and cross-modal gene information generation. We introduce an interpretable reasoning-path optimization framework and, for the first time, the “AI collaborator” paradigm—enabling proactive identification and correction of expert diagnostic logic biases. Trained on large-scale, multicenter pathological data, the model demonstrates significant improvements in real-world clinical settings (e.g., Yale School of Medicine): +32% diagnostic efficiency, inter-rater consistency (Cohen’s κ = 0.91), and enhanced cross-modal alignment accuracy. These results validate both clinical deployability and intrinsic interpretability.
📝 Abstract
Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current pathology-specific visual language models still lack capacities in making diagnosis with rigorous reasoning paths as well as handling divergent tasks, and thus challenges of building AI Copilots for real scenarios still exist. Here we introduce TeamPath, an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets, to work as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for the clinical usage. We also collaborate with pathologists from Yale School of Medicine to demonstrate that TeamPath can assist them in working more efficiently by identifying and correcting expert conclusions and reasoning paths. Overall, TeamPath can flexibly choose the best settings according to the needs, and serve as an innovative and reliable system for information communication across different modalities and experts.