CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current vision-language models struggle to effectively reason about longitudinal temporal changes of lesions in chest X-rays, limiting their clinical utility for cross-temporal image comparison. To address this gap, this work introduces the first fine-grained, spatiotemporally aligned benchmark for chest X-ray temporal reasoning, comprising paired anteroposterior radiographs with lesion-level evolution annotations and five clinically defined disease progression labels. By integrating multi-source data and weakly supervised silver-standard labels, we construct a large-scale dataset augmented through medical image registration, radiology report parsing, and automated annotation generation, yielding joint spatial–temporal supervision signals. This framework enables evaluation of temporal reasoning and lesion localization under zero-shot settings. Experiments reveal that state-of-the-art models exhibit significant shortcomings in recognizing subtle evolution states—such as stable or improving conditions—and in precise lesion localization, highlighting critical limitations in longitudinal disease modeling.

📝 Abstract

Chest radiograph interpretation requires temporal reasoning over prior and current studies, yet most vision-language models are trained on static image-report pairs and lack explicit supervision for modeling longitudinal change. We introduce CheXTemporal, a dataset for temporally grounded reasoning in chest radiography consisting of paired prior-current chest X-rays (CXR) with finding-level temporal and spatial annotations. The dataset includes a five-class progression taxonomy (new, worse, stable, improved, resolved), localized spatial supervision of pathology, explicit spatial-temporal alignment across paired studies, and multi-source coverage for cross-domain evaluation. We additionally construct a 280K-pair silver dataset with automatically derived temporal and anatomical supervision for large-scale evaluation under weaker supervision. Using these resources, we evaluate multiple state-of-the-art vision-language CXR models on grounding and progression-classification tasks in a zero-shot setting. Across both gold and silver evaluations, current models exhibit consistent limitations in spatial grounding, fine-grained temporal reasoning, and robustness under distribution shift. In particular, models perform substantially better on salient progression categories such as worse than on temporally subtle states such as stable and resolved, suggesting limited modeling of longitudinal disease evolution in chest radiography.

Problem

Research questions and friction points this paper is trying to address.

temporal reasoning

chest radiography

longitudinal change

spatial grounding

disease progression

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal reasoning

chest radiography

spatial-temporal grounding