Chest X-ray Foundation Model with Global and Local Representations Integration

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address poor generalization, high annotation costs, and weak out-of-distribution (OOD) performance in chest X-ray (CXR) analysis, this paper introduces CheXFound, a self-supervised vision foundation model for CXR. Its core innovation is the Global-Local Representation Integration (GLoRI) module—the first to jointly model global anatomical semantics and local lesion features within a CXR foundation model. Trained self-supervised on the million-scale CX-1M dataset using a Vision Transformer (ViT) backbone, CheXFound supports zero-shot transfer and linear-probe evaluation. On CXR-LT 24, it achieves state-of-the-art performance with minimal labeled data, accurately classifying 40 thoracic pathologies. Moreover, it significantly improves OOD generalization—demonstrating superior cardiovascular risk assessment and mortality prediction. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Chest X-ray (CXR) is the most frequently ordered imaging test, supporting diverse clinical tasks from thoracic disease detection to postoperative monitoring. However, task-specific classification models are limited in scope, require costly labeled data, and lack generalizability to out-of-distribution datasets. To address these challenges, we introduce CheXFound, a self-supervised vision foundation model that learns robust CXR representations and generalizes effectively across a wide range of downstream tasks. We pretrain CheXFound on a curated CXR-1M dataset, comprising over one million unique CXRs from publicly available sources. We propose a Global and Local Representations Integration (GLoRI) module for downstream adaptations, by incorporating disease-specific local features with global image features for enhanced performance in multilabel classification. Our experimental results show that CheXFound outperforms state-of-the-art models in classifying 40 disease findings across different prevalence levels on the CXR-LT 24 dataset and exhibits superior label efficiency on downstream tasks with limited training data. Additionally, CheXFound achieved significant improvements on new tasks with out-of-distribution datasets, including opportunistic cardiovascular disease risk estimation and mortality prediction. These results highlight CheXFound's strong generalization capabilities, enabling diverse adaptations with improved label efficiency. The project source code is publicly available at https://github.com/RPIDIAL/CheXFound.

Problem

Research questions and friction points this paper is trying to address.

Develops a self-supervised chest X-ray foundation model.

Improves generalizability across diverse clinical tasks.

Enhances performance with limited labeled training data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised vision foundation model

Global and Local Representations Integration

Enhanced multilabel classification performance

🔎 Similar Papers

No similar papers found.