🤖 AI Summary
Existing public breast ultrasound (BUS) datasets suffer from limited scale, coarse-grained annotations, and insufficient coverage of rare pathological subtypes, hindering the clinical deployment of interpretable AI. To address these limitations, we introduce BUS-CoT—the first high-quality, Chain-of-Thought (CoT)-enabled BUS benchmark dataset, comprising 11,439 expert-annotated images spanning all 99 histopathologically confirmed tissue types. BUS-CoT features a novel four-stage expert annotation schema—“Observation → Feature → Diagnosis → Histopathology”—integrating multi-level clinical knowledge with rigorous validation protocols. This structured reasoning framework significantly enhances model performance on rare lesion classification and improves cross-scenario generalizability. BUS-CoT establishes an open, authoritative evaluation benchmark for developing interpretable, robust AI systems in breast cancer diagnosis, enabling transparent, clinically grounded decision-making.
📝 Abstract
Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 images of 10,019 lesions from 4,838 patients and covers all 99 histopathology types. To facilitate research on incentivizing CoT reasoning, we construct the reasoning processes based on observation, feature, diagnosis and pathology labels, annotated and verified by experienced experts. Moreover, by covering lesions of all histopathology types, we aim to facilitate robust AI systems in rare cases, which can be error-prone in clinical practice.