Leveraging Complementary Attention maps in vision transformers for OCT image analysis

📅 2023-10-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
To address the challenge of automatically identifying optical-defect–related retinal biomarkers in OCT images, this paper proposes a dual-model complementary attention detection framework that integrates both local and global structural priors. We innovatively design a synergistic architecture combining MaxViT (convolutional layers with stride-aware attention) and EVA-02 (a pure vision Transformer), augmented by a cross-model complementary attention fusion mechanism. Furthermore, we introduce bidirectional knowledge distillation—first applied in this context—which enables a single distilled model to outperform the ensemble: the distilled model achieves patient-level F1 = 0.8527 on the IEEE VIP Cup 2023 OCT track, surpassing the runner-up by 3.8%, while accelerating inference by 2.1× and reducing parameters by 64%. This approach balances diagnostic robustness and deployment efficiency, offering an interpretable, lightweight solution for clinical OCT-assisted interpretation.
📝 Abstract
Optical Coherence Tomography (OCT) scan yields all possible cross-section images of a retina for detecting biomarkers linked to optical defects. Due to the high volume of data generated, an automated and reliable biomarker detection pipeline is necessary as a primary screening stage. We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans. In collaboration with trained ophthalmologists, we identify local and global structures in biomarkers. Through a comprehensive and systematic review of existing vision architectures, we evaluate different convolution and attention mechanisms for biomarker detection. We find that MaxViT, a hybrid vision transformer combining convolution layers with strided attention, is better suited for local feature detection, while EVA-02, a standard vision transformer leveraging pure attention and large-scale knowledge distillation, excels at capturing global features. We ensemble the predictions of both models to achieve first place in the IEEE Video and Image Processing Cup 2023 competition on OCT biomarker detection, achieving a patient-wise F1 score of 0.8527 in the final phase of the competition, scoring 3.8% higher than the next best solution. Finally, we used knowledge distillation to train a single MaxViT to outperform our ensemble at a fraction of the computation cost.
Problem

Research questions and friction points this paper is trying to address.

Automated detection of retinal biomarkers in OCT scans
Combining local and global feature analysis for biomarker identification
Improving accuracy and efficiency in OCT image analysis pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid vision transformer combines convolution and attention
Ensemble of MaxViT and EVA-02 models
Knowledge distillation reduces computation cost significantly
🔎 Similar Papers
No similar papers found.
H
Haz Sameen
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
S
Shahgir ⋆ Tanjeem
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
A
Azwad Zaman
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
⋆. Khondker
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
S
Salman Sayeed
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
M
Md. Asif
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
H
Haider Sheikh
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology
Saifur Rahman
Saifur Rahman
Student in Comilla University
Natural Language ProcessingData MiningMachine Learning
J
Jony M. Sohel
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology