🤖 AI Summary
This work systematically identifies critical bottlenecks hindering the clinical deployment of medical large language models (LLMs) and multimodal LLMs (MLLMs), including ill-defined capability boundaries, lack of standardized evaluation metrics, challenges in cross-modal alignment, salient ethical risks, and difficulties in real-world clinical integration. To address these, we propose a unified taxonomy for medical LLMs/MLLMs and introduce a task-mapping matrix to enable structured technical comparison across 120+ representative studies. Our analysis identifies seven core clinical tasks, five fundamental evaluation bottlenecks, and three viable pathways for clinical deployment. We establish cross-modal alignment and domain adaptation as pivotal future research paradigms and provide actionable evaluation guidelines and concrete ethical governance recommendations grounded in clinical practice and regulatory considerations.