A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing public breast ultrasound (BUS) datasets suffer from limited scale, coarse-grained annotations, and insufficient coverage of rare pathological subtypes, hindering the clinical deployment of interpretable AI. To address these limitations, we introduce BUS-CoT—the first high-quality, Chain-of-Thought (CoT)-enabled BUS benchmark dataset, comprising 11,439 expert-annotated images spanning all 99 histopathologically confirmed tissue types. BUS-CoT features a novel four-stage expert annotation schema—“Observation → Feature → Diagnosis → Histopathology”—integrating multi-level clinical knowledge with rigorous validation protocols. This structured reasoning framework significantly enhances model performance on rare lesion classification and improves cross-scenario generalizability. BUS-CoT establishes an open, authoritative evaluation benchmark for developing interpretable, robust AI systems in breast cancer diagnosis, enabling transparent, clinically grounded decision-making.

Technology Category

Application Category

📝 Abstract

Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 images of 10,019 lesions from 4,838 patients and covers all 99 histopathology types. To facilitate research on incentivizing CoT reasoning, we construct the reasoning processes based on observation, feature, diagnosis and pathology labels, annotated and verified by experienced experts. Moreover, by covering lesions of all histopathology types, we aim to facilitate robust AI systems in rare cases, which can be error-prone in clinical practice.

Problem

Research questions and friction points this paper is trying to address.

Limited high-quality breast ultrasound datasets for AI development

Lack of chain-of-thought reasoning annotations in medical imaging

AI systems struggle with rare breast lesion pathology types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought reasoning dataset for breast ultrasound

Dataset covers all 99 histopathology lesion types

Expert-annotated reasoning process from observation to pathology

🔎 Similar Papers

Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification