Xray2Xray: World Model from Chest X-rays with Volumetric Context

πŸ“… 2025-06-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Chest X-ray (CXR) imaging suffers from severe anatomical superposition due to its inherent 2D projection, limiting diagnostic accuracy and risk prediction. To address this, we propose the first self-supervised β€œworld model” for single-view CXR sequences, jointly leveraging a visual encoder and a state-transition module to implicitly model dynamic thoracic volume changes and learn 3D-anatomically grounded latent representations from multi-angle CXRs. Crucially, our method requires no 3D annotations, yet recovers interpretable volumetric information and enables cross-view representation alignment. On cardiovascular risk prediction, it significantly outperforms both supervised and existing self-supervised baselines. For classification of five common pathologies, it achieves state-of-the-art performance. Moreover, it enables high-fidelity reconstruction of volumetric context, establishing a novel paradigm for 3D semantic understanding of CXRs.

Technology Category

Application Category

πŸ“ Abstract
Chest X-rays (CXRs) are the most widely used medical imaging modality and play a pivotal role in diagnosing diseases. However, as 2D projection images, CXRs are limited by structural superposition, which constrains their effectiveness in precise disease diagnosis and risk prediction. To address the limitations of 2D CXRs, this study introduces Xray2Xray, a novel World Model that learns latent representations encoding 3D structural information from chest X-rays. Xray2Xray captures the latent representations of the chest volume by modeling the transition dynamics of X-ray projections across different angular positions with a vision model and a transition model. We employed the latent representations of Xray2Xray for downstream risk prediction and disease diagnosis tasks. Experimental results showed that Xray2Xray outperformed both supervised methods and self-supervised pretraining methods for cardiovascular disease risk estimation and achieved competitive performance in classifying five pathologies in CXRs. We also assessed the quality of Xray2Xray's latent representations through synthesis tasks and demonstrated that the latent representations can be used to reconstruct volumetric context.
Problem

Research questions and friction points this paper is trying to address.

Overcoming 2D CXR limitations in disease diagnosis
Learning 3D structural info from chest X-rays
Improving risk prediction and pathology classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns 3D structural info from 2D X-rays
Uses vision and transition models for dynamics
Enhances disease diagnosis and risk prediction
πŸ”Ž Similar Papers
No similar papers found.