π€ AI Summary
Existing few-shot fine-grained image classification methods struggle to adaptively adjust their receptive fields for effective joint modeling of spatial and frequency-domain features. To address this limitation, this work proposes ARF-SFR-Net, which introduces an adaptive receptive field mechanism that dynamically selects the optimal receptive field size to extract and fuse spatial-frequency features. The framework integrates feature reconstruction with episodic training to enable end-to-end optimization. Extensive experiments demonstrate that the proposed method significantly outperforms current state-of-the-art approaches across multiple few-shot fine-grained classification benchmarks, confirming its effectiveness and superiority.
π Abstract
Feature reconstruction techniques are widely applied for few-shot fine-grained image classification (FSFGIC). Our research indicates that one of the main challenges facing existing feature-based FSFGIC methods is how to choose the size of the receptive field to extract feature descriptors (including spatial and frequency feature descriptors) from different category input images, thereby better performing the FSFGIC tasks. To address this, an adaptive receptive field-based spatial-frequency feature reconstruction network (ARF-SFR-Net) is proposed. The designed ARF-SFR-Net has the capability to adaptively determine receptive field sizes for obtaining spatial and frequency features, and effectively fuse them for reconstruction and FSFGIC tasks. The designed ARF-SFR-Net can be easily embedded into a given episodic training mechanism for end-to-end training from scratch. Extensive experiments on multiple FSFGIC benchmarks demonstrate the effectiveness and superiority of the proposed ARF-SFR-Net over state-of-the-art approaches. The code is available at: https://github.com/ICL-SUST/ARF-SFR-Net.git.