CheXLearner: Text-Guided Fine-Grained Representation Learning for Progression Detection

๐Ÿ“… 2025-05-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address semantic misalignment caused by coarse-grained image-text alignment and the lack of medical semantic guidance in purely vision-based approaches for longitudinal chest X-ray analysis, this paper proposes the first end-to-end multimodal framework integrating anatomical region localization, hyperbolic geometric modeling, and Riemannian manifold alignment. Its core innovation is the Med-Manifold Alignment Module (Med-MAM), which enables fine-grained, pathology-interpretable alignment of anatomical structures across time seriesโ€”marking the first such effort at the clinical interpretability level. We further introduce multimodal fine-grained supervision and dynamic low-level feature optimization. On anatomical region progression detection, our method achieves 81.12% accuracy (+17.2%) and 80.32% F1-score (+11.05%). For downstream disease classification, it attains a mean AUC of 91.52%, significantly outperforming state-of-the-art methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Temporal medical image analysis is essential for clinical decision-making, yet existing methods either align images and text at a coarse level - causing potential semantic mismatches - or depend solely on visual information, lacking medical semantic integration. We present CheXLearner, the first end-to-end framework that unifies anatomical region detection, Riemannian manifold-based structure alignment, and fine-grained regional semantic guidance. Our proposed Med-Manifold Alignment Module (Med-MAM) leverages hyperbolic geometry to robustly align anatomical structures and capture pathologically meaningful discrepancies across temporal chest X-rays. By introducing regional progression descriptions as supervision, CheXLearner achieves enhanced cross-modal representation learning and supports dynamic low-level feature optimization. Experiments show that CheXLearner achieves 81.12% (+17.2%) average accuracy and 80.32% (+11.05%) F1-score on anatomical region progression detection - substantially outperforming state-of-the-art baselines, especially in structurally complex regions. Additionally, our model attains a 91.52% average AUC score in downstream disease classification, validating its superior feature representation.
Problem

Research questions and friction points this paper is trying to address.

Detects anatomical region progression in temporal chest X-rays
Aligns images and text at fine-grained semantic level
Integrates medical semantics with visual information
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end framework for anatomical region detection
Hyperbolic geometry for robust structure alignment
Regional progression descriptions for cross-modal learning
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yuanzhuo Wang
School of Computer Science and Engineering, Central South University, Changsha, China
Junwen Duan
Junwen Duan
Central South University
Artificial IntelligenceNatural Language ProcessingSocial Computing
X
Xinyu Li
School of Computer Science and Engineering, Central South University, Changsha, China
Jianxin Wang
Jianxin Wang
School of Computer Science and Engineering, Central South university
AlgorithmBioinformaticsComputer Network