🤖 AI Summary
This study addresses the challenge of performing binary diagnostic classification on X-ray images without direct access to paired CT scans during inference, leveraging only a limited set of patient-level CT–X-ray pairs for training. The task is formalized as a cross-modal knowledge distillation problem, using JDCNet as the baseline framework, and systematically evaluates multiple distillation strategies—including logit-based KD, attention transfer, feature hinting, and late fusion. The authors propose a reproducible pilot protocol that clarifies task formulation, failure modes, and minimal validation criteria for reliable cross-modal transfer. Experimental results show that under the original data split, plain cross-modal KD achieves an accuracy of 0.875; after resampling, late fusion yields the best accuracy (0.885), while same-modality distillation outperforms in macro-F1 (0.554) and balanced accuracy (0.660), indicating that cross-modal approaches have yet to demonstrate consistent superiority.
📝 Abstract
Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control attains the highest mean result on the four-image validation subset (0.875 accuracy and 0.714 macro-F1), whereas the full module-augmented JDCNet variant remains at 0.750 accuracy and 0.429 macro-F1. To test whether that ranking is a split artifact, we additionally run eight patient-level Monte Carlo resamples with same-case comparisons, stronger mechanism controls based on attention transfer and feature hints, and imbalance-sensitive analyses. Under this resampled protocol, late fusion attains the highest mean accuracy (0.885), same-modality distillation attains the highest mean macro-F1 (0.554) and balanced accuracy (0.660), the plain cross-modal control drops to 0.500 mean balanced accuracy, and neither attention transfer nor feature hints recover a robust cross-modality advantage. The contribution of this study is therefore not a validated CT-to-X-ray architecture, but a reproducible and evidence-bounded pilot protocol that makes the exact task definition, failure modes, ranking instability, and the minimum requirements for future credible CT-to-X-ray transfer claims explicit.