🤖 AI Summary
Traditional max-pooling in 3D OCT volume segmentation degrades spatial detail and blurs inter-layer boundaries. To address this, we propose embedding tunable wavelet units (UwUs) into a 3D retinal layer segmentation network, introducing— for the first time—learnable wavelet filter banks to replace fixed downsampling operations. We design three wavelet-based downsampling modules: OrthLattUwU, BiorthLattUwU, and LS-BiorthLattUwU, and integrate them into a motion-corrected MGU-Net architecture to enable joint modeling and multi-scale fusion of high-frequency textural and low-frequency semantic features. Evaluated on the Jacobs Retina Center dataset, our approach significantly improves segmentation accuracy and Dice scores; LS-BiorthLattUwU achieves the best performance, demonstrating that learnable wavelet downsampling provides critical gains in structural fidelity and spatial consistency for medical volumetric image segmentation.
📝 Abstract
This paper presents the first study to apply tunable wavelet units (UwUs) for 3D retinal layer segmentation from Optical Coherence Tomography (OCT) volumes. To overcome the limitations of conventional max-pooling, we integrate three wavelet-based downsampling modules, OrthLattUwU, BiorthLattUwU, and LS-BiorthLattUwU, into a motion-corrected MGU-Net architecture. These modules use learnable lattice filter banks to preserve both low- and high-frequency features, enhancing spatial detail and structural consistency. Evaluated on the Jacobs Retina Center (JRC) OCT dataset, our framework shows significant improvement in accuracy and Dice score, particularly with LS-BiorthLattUwU, highlighting the benefits of tunable wavelet filters in volumetric medical image segmentation.