🤖 AI Summary
This work addresses the fine-grained classification of 14 pulmonary diseases in chest X-ray images. We propose a novel two-stage cascaded vision transformer architecture that uniquely integrates ViT—capturing global semantic context—with Swin Transformer—modeling local texture and lesion-level details—thereby jointly encoding long-range dependencies and discriminative pathological features. The model is trained end-to-end under supervised learning, incorporating medical-image-specific preprocessing. On an independent test set, it achieves a label-wise classification accuracy of 92.06%, outperforming single-stage baselines by +3.2 percentage points, demonstrating enhanced robustness for multi-class pulmonary disease discrimination. This study establishes a new paradigm for interpretable, high-accuracy AI-assisted diagnosis of thoracic pathologies.
📝 Abstract
Lung diseases have become a prevalent problem throughout the United States, affecting over 34 million people. Accurate and timely diagnosis of the different types of lung diseases is critical, and Artificial Intelligence (AI) methods could speed up these processes. A dual-stage vision transformer is built throughout this research by integrating a Vision Transformer (ViT) and a Swin Transformer to classify 14 different lung diseases from X-ray scans of patients with these diseases. The proposed model achieved an accuracy of 92.06% on a label-level when making predictions on an unseen testing subset of the dataset after data preprocessing and training the neural network. The model showed promise for accurately classifying lung diseases and diagnosing patients who suffer from these harmful diseases.