🤖 AI Summary
This study addresses the high cost and limited scale of manually annotated thyroid nodule ultrasound images, which constrain the performance of deep learning models. It presents the first systematic evaluation of the practical value of automatically constructed medical image datasets by leveraging an automated data collection pipeline to build a large-scale training set. Deep learning classification models trained on this automatically curated dataset were evaluated using the area under the receiver operating characteristic curve (AUC). Results demonstrate that models trained exclusively on the full automatically collected dataset achieved an AUC of 0.694, significantly outperforming those trained on manually annotated data (AUC = 0.643, P < 0.001) and matching the performance of models trained on high-precision subsets. These findings validate the effectiveness and feasibility of automated dataset construction for enhancing model performance in medical image analysis.
📝 Abstract
The diagnosis of thyroid nodule cancers commonly utilizes ultrasound images. Several studies showed that deep learning algorithms designed to classify benign and malignant thyroid nodules could match radiologists'performance. However, data availability for training deep learning models is often limited due to the significant effort required to curate such datasets. The previous study proposed a method to curate thyroid nodule datasets automatically. It was tested to have a 63% yield rate and 83% accuracy. However, the usefulness of the generated data for training deep learning models remains unknown. In this study, we conducted experiments to determine whether using a automatically-curated dataset improves deep learning algorithms'performance. We trained deep learning models on the manually annotated and automatically-curated datasets. We also trained with a smaller subset of the automatically-curated dataset that has higher accuracy to explore the optimum usage of such dataset. As a result, the deep learning model trained on the manually selected dataset has an AUC of 0.643 (95% confidence interval [CI]: 0.62, 0.66). It is significantly lower than the AUC of the 6automatically-curated dataset trained deep learning model, 0.694 (95% confidence interval [CI]: 0.67, 0.73, P<.001). The AUC of the accurate subset trained deep learning model is 0.689 (95% confidence interval [CI]: 0.66, 0.72, P>.43), which is insignificantly worse than the AUC of the full automatically-curated dataset. In conclusion, we showed that using a automatically-curated dataset can substantially increase the performance of deep learning algorithms, and it is suggested to use all the data rather than only using the accurate subset.