Cross-Architecture Knowledge Distillation (KD) for Retinal Fundus Image Anomaly Detection on NVIDIA Jetson Nano

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In low-resource settings, retinal disease screening faces dual challenges of limited computational capacity on edge devices and scarcity of annotated data. Method: We propose a cross-architecture knowledge distillation framework for fundus image anomaly detection, incorporating a partitioned cross-attention (PCA) module and grouped linear (GL) projector, integrated with multi-view robust training to effectively transfer clinical discriminative knowledge from a teacher model (I-JEPA pre-trained ViT) to a lightweight CNN student model deployable on NVIDIA Jetson Nano. Contribution/Results: The student model retains only 1.03% of the teacher’s parameters while achieving 89% classification accuracy—93% of the teacher’s diagnostic performance—outperforming state-of-the-art distillation methods. This work is the first to combine self-supervised ViTs with structured cross-architecture distillation for fundus screening, empirically validating high-fidelity model compression in real-world edge healthcare deployments.

Technology Category

Application Category

📝 Abstract
Early and accurate identification of retinal ailments is crucial for averting ocular decline; however, access to dependable diagnostic devices is not often available in low-resourced settings. This project proposes to solve that by developing a lightweight, edge-device deployable disease classifier using cross-architecture knowledge distilling. We first train a high-capacity vision transformer (ViT) teacher model, pre-trained using I-JEPA self-supervised learning, to classify fundus images into four classes: Normal, Diabetic Retinopathy, Glaucoma, and Cataract. We kept an Internet of Things (IoT) focus when compressing to a CNN-based student model for deployment in resource-limited conditions, such as the NVIDIA Jetson Nano. This was accomplished using a novel framework which included a Partitioned Cross-Attention (PCA) projector, a Group-Wise Linear (GL) projector, and a multi-view robust training method. The teacher model has 97.4 percent more parameters than the student model, with it achieving 89 percent classification with a roughly 93 percent retention of the teacher model's diagnostic performance. The retention of clinical classification behavior supports our method's initial aim: compression of the ViT while retaining accuracy. Our work serves as an example of a scalable, AI-driven triage solution for retinal disorders in under-resourced areas.
Problem

Research questions and friction points this paper is trying to address.

Develop lightweight retinal disease classifier for edge devices
Compress ViT model to CNN while retaining diagnostic accuracy
Enable scalable AI-driven triage in resource-limited settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-architecture knowledge distillation for edge deployment
ViT teacher model with I-JEPA self-supervised learning
PCA and GL projectors for efficient CNN student model
🔎 Similar Papers
No similar papers found.
Berk Yilmaz
Berk Yilmaz
Columbia University
AIMachine Learning
A
Aniruddh Aiyengar
Columbia University