🤖 AI Summary
This study addresses the challenge of automated detection of electron-dense deposits (EDDs) in glomerular diseases, which is hindered by the scarcity of high-quality annotated data. While crowdsourcing reduces annotation costs, it introduces label noise. To mitigate this, the authors propose an active label cleaning approach that incorporates a label selection module to identify informative samples based on the inconsistency between crowdsourced labels and model predictions. This module assigns instance-level noise scores and, guided by an active learning strategy, prioritizes high-value noisy samples for expert relabeling. The resulting cleaned dataset enables the training of a high-accuracy detection model. Evaluated on a private dataset, the method achieves an AP50 of 67.18%, representing an 18.83% improvement over models trained directly on noisy labels and reaching 95.79% of the performance attainable with fully expert-annotated data, while reducing annotation costs by 73.30%.
📝 Abstract
Automated detection of electron dense deposits (EDD) in glomerular disease is hindered by the scarcity of high-quality labeled data. While crowdsourcing reduces annotation cost, it introduces label noise. We propose an active label cleaning method to efficiently denoise crowdsourced datasets. Our approach uses active learning to select the most valuable noisy samples for expert re-annotation, building high-accuracy cleaning models. A Label Selection Module leverages discrepancies between crowdsourced labels and model predictions for both sample selection and instance-level noise grading. Experiments show our method achieves 67.18% AP\textsubscript{50} on a private dataset, an 18.83% improvement over training on noisy labels. This performance reaches 95.79% of that with full expert annotation while reducing annotation cost by 73.30%. The method provides a practical, cost-effective solution for developing reliable medical AI with limited expert resources.