๐ค AI Summary
Existing AI-assisted ultrasound video diagnosis research suffers from insufficient data diversity, limited model performance, and poor clinical interpretability. To address these challenges, this work proposes a multimodal intelligent diagnostic framework. First, we introduce CUVโthe first multi-source, heterogeneous ultrasound video dataset targeting three organs (thyroid, breast, liver), comprising 495 videos covering five lesion categories. Second, we design CTU-Net, a lightweight spatiotemporal fusion network that efficiently models video features, achieving an 86.73% classification accuracy. Third, we integrate a large language model (LLM) to align visual representations with clinical text, generating interpretable, guideline-compliant diagnostic reports; physician evaluations yield a mean score of 3.2/5. This is the first study to enable end-to-end ultrasound videoโtext joint reasoning, significantly enhancing diagnostic efficiency, accuracy, and clinical deployability.
๐ Abstract
AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of dataset diversity, diagnostic performance, and clinical applicability. In this study, we propose extbf{Auto-US}, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text. To support this, we constructed extbf{CUV Dataset} of 495 ultrasound videos spanning five categories and three organs, aggregated from multiple open-access sources. We developed extbf{CTU-Net}, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73% Furthermore, by incorporating large language models, Auto-US is capable of generating clinically meaningful diagnostic suggestions. The final diagnostic scores for each case exceeded 3 out of 5 and were validated by professional clinicians. These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications. Code and data are available at: https://github.com/Bean-Young/Auto-US.