Overview of the CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the limitations of existing chest X-ray AI diagnostic benchmarks, which are predominantly based on single-center, closed-set settings and struggle with the long-tailed distribution of known pathologies and unseen rare diseases encountered in real-world clinical practice. To bridge this gap, we introduce the first large-scale, multi-center chest X-ray dataset and establish two core tasks: multi-label classification of 30 known pathologies and open-world generalization to six previously unseen rare diseases. For the first time, model robustness is jointly evaluated under realistic multi-center conditions involving both long-tailed and zero-shot settings. Leveraging large-scale vision-language pretraining, the top-performing solution achieves mean average precisions of 0.5854 and 0.4315 on the two tasks, respectively, substantially mitigating performance degradation in zero-shot diagnosis and advancing the development of generalizable, clinically viable AI diagnostic systems.

Technology Category

Application Category

📝 Abstract

Chest X-ray (CXR) interpretation is hindered by the long-tailed distribution of pathologies and the open-world nature of clinical environments. Existing benchmarks often rely on closed-set classes from single institutions, failing to capture the prevalence of rare diseases or the appearance of novel findings. To address this, we present the CXR-LT 2026 challenge. This third iteration of the benchmark introduces a multi-center dataset comprising over 145,000 images from PadChest and NIH Chest X-ray datasets. The challenge defines two core tasks: (1) Robust Multi-Label Classification on 30 known classes and (2) Open-World Generalization to 6 unseen (out-of-distribution) rare disease classes. We report the results of the top-performing teams, evaluating them via mean Average Precision (mAP), AUROC, and F1-score. The winning solutions achieved an mAP of 0.5854 on Task 1 and 0.4315 on Task 2, demonstrating that large-scale vision-language pre-training significantly mitigates the performance drop typically associated with zero-shot diagnosis.

Problem

Research questions and friction points this paper is trying to address.

long-tailed distribution

zero-shot classification

chest X-ray

open-world generalization

rare diseases

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-tailed classification

zero-shot learning

multi-center dataset