HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early diagnosis of diabetic peripheral neuropathy (DPN) remains challenging due to the limitations of current corneal confocal microscopy (CCM) analysis methods—namely, reliance on handcrafted features, scarcity of annotated data, and poor generalizability. To address these issues, we propose a hierarchical masked self-supervised vision Transformer. Our method introduces a pooling-driven hierarchical architecture with dual attention mechanisms and a block-wise masking strategy for multi-scale neural structure modeling under minimal annotation requirements. It further incorporates absolute positional encoding and multi-scale decoder feature fusion to enable end-to-end segmentation and classification. Evaluated on a clinical CCM dataset, our model achieves 61.34% mIoU and 70.40% diagnostic accuracy—outperforming Swin Transformer and HiViT by +6.39% in accuracy—while using fewer parameters and demonstrating superior robustness.

Technology Category

Application Category

📝 Abstract
Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.
Problem

Research questions and friction points this paper is trying to address.

Automated corneal nerve segmentation for DPN diagnosis
Overcoming inefficient feature extraction in CCM analysis
Reducing reliance on labeled data with self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical masked self-supervised Vision Transformer
Pooling-based hierarchical and dual attention mechanisms
Block-masked SSL framework reduces labeled data reliance
🔎 Similar Papers
No similar papers found.
X
Xin Zhang
Department of Computing, and Mathematics, Manchester Metropolitan University, Manchester M15GD, U.K
Liangxiu Han
Liangxiu Han
Professor, Manchester Metropolitan University, UK
Big Data Analytics/Machine Learning/AIParallel & Distributed Computing/CloudBioinformatics
Y
Yue Shi
Department of Computing, and Mathematics, Manchester Metropolitan University, Manchester M15GD, U.K
Y
Yanlin Zheng
Department of Eye and Vision Sciences, University of Liverpool, Liverpool L78TX, U.K
A
Alam Uazman
Department of Eye and Vision Sciences, University of Liverpool, Liverpool L78TX, U.K
Maryam Ferdousi
Maryam Ferdousi
Research Fellow, University of Manchester
NeuropathyDiabetescorneal confoal microscopyoptometry
R
Rayaz Malik
Department of Medicine, Weill Cornell Medicine-Qatar, Doha, Qatar