🤖 AI Summary
In 3D medical image analysis, single-task models neglect inter-task correlations, leading to suboptimal efficiency and computational overhead. To address this, we propose MTMed3D—the first end-to-end, Transformer-based multi-task framework unifying lesion detection, segmentation, and classification. Our approach employs a shared Transformer encoder to extract hierarchical, multi-scale features, coupled with lightweight CNN decoders tailored for each task; crucially, it explicitly models cross-task semantic dependencies. Evaluated on BraTS 2018/2019, MTMed3D achieves state-of-the-art detection performance while maintaining segmentation and classification accuracy comparable to dedicated single-task models. Moreover, it significantly accelerates inference—up to 2.3× faster—and reduces computational cost by over 40%, without compromising clinical utility. This work establishes a new paradigm for efficient, deployable AI-assisted clinical diagnosis in 3D medical imaging.
📝 Abstract
In the field of medical imaging, AI-assisted techniques such as object detection, segmentation, and classification are widely employed to alleviate the workload of physicians and doctors. However, single-task models are predominantly used, overlooking the shared information across tasks. This oversight leads to inefficiencies in real-life applications. In this work, we propose MTMed3D, a novel end-to-end Multi-task Transformer-based model to address the limitations of single-task models by jointly performing 3D detection, segmentation, and classification in medical imaging. Our model uses a Transformer as the shared encoder to generate multi-scale features, followed by CNN-based task-specific decoders. The proposed framework was evaluated on the BraTS 2018 and 2019 datasets, achieving promising results across all three tasks, especially in detection, where our method achieves better results than prior works. Additionally, we compare our multi-task model with equivalent single-task variants trained separately. Our multi-task model significantly reduces computational costs and achieves faster inference speed while maintaining comparable performance to the single-task models, highlighting its efficiency advantage. To the best of our knowledge, this is the first work to leverage Transformers for multi-task learning that simultaneously covers detection, segmentation, and classification tasks in 3D medical imaging, presenting its potential to enhance diagnostic processes. The code is available at https://github.com/fanlimua/MTMed3D.git.