X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Accurate 3D CT reconstruction from extremely sparse-view (<10) X-ray projections remains challenging due to the limited representational capacity of CNNs and scarcity of high-quality training data. Method: We propose X-former—a novel framework integrating an MLP-based image tokenizer with a Transformer encoder to enable dynamic modeling for arbitrary numbers of projection views; we further introduce X-triplane, an implicit neural radiance field representation for 3D volumetric attenuation coefficients, enhancing geometric consistency and fine-detail recovery. To support training, we construct Torso-16K, the first large-scale synthetic dataset comprising paired X-ray projections and ground-truth volumetric reconstructions. Contribution/Results: Experiments demonstrate that X-former achieves high-fidelity reconstruction in under one second—improving PSNR by 1.5 dB over state-of-the-art methods while accelerating inference by 27×. Moreover, it significantly boosts downstream lung segmentation performance, establishing a new paradigm for low-dose CT reconstruction.

Technology Category

Application Category

📝 Abstract

Sparse-view 3D CT reconstruction aims to recover volumetric structures from a limited number of 2D X-ray projections. Existing feedforward methods are constrained by the limited capacity of CNN-based architectures and the scarcity of large-scale training datasets. In this paper, we propose an X-ray Large Reconstruction Model (X-LRM) for extremely sparse-view (<10 views) CT reconstruction. X-LRM consists of two key components: X-former and X-triplane. Our X-former can handle an arbitrary number of input views using an MLP-based image tokenizer and a Transformer-based encoder. The output tokens are then upsampled into our X-triplane representation, which models the 3D radiodensity as an implicit neural field. To support the training of X-LRM, we introduce Torso-16K, a large-scale dataset comprising over 16K volume-projection pairs of various torso organs. Extensive experiments demonstrate that X-LRM outperforms the state-of-the-art method by 1.5 dB and achieves 27x faster speed and better flexibility. Furthermore, the downstream evaluation of lung segmentation tasks also suggests the practical value of our approach. Our code, pre-trained models, and dataset will be released at https://github.com/caiyuanhao1998/X-LRM

Problem

Research questions and friction points this paper is trying to address.

Recover 3D CT structures from sparse X-ray views.

Overcome CNN limitations and dataset scarcity in CT reconstruction.

Achieve faster, flexible, and accurate sparse-view CT recovery.

Innovation

Methods, ideas, or system contributions that make the work stand out.

X-former handles arbitrary input views

X-triplane models 3D radiodensity implicitly

Torso-16K dataset supports large-scale training

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

3D Computer Vision Researcher

Kitware

Arlington, Virginia

Authors to Follow