Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the limitations of existing few-shot fine-grained image classification methods that rely solely on spatial-domain features, which often suffer from texture bias, high-frequency noise interference, and unstable metric learning. To overcome these issues, the paper proposes a frequency-enhanced dual-subspace network that, for the first time, incorporates frequency-domain structural information into few-shot fine-grained classification. The approach leverages discrete cosine transform combined with low-pass filtering to decouple low-frequency structural components from spatial textures, constructing two complementary subspaces. It then dynamically fuses their projected distances through truncated singular value decomposition and an adaptive gating mechanism. Evaluated on four standard benchmarks—CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft—the method achieves state-of-the-art performance, significantly enhancing structural stability, generalization capability, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.

Problem

Research questions and friction points this paper is trying to address.

few-shot learning

fine-grained image classification

metric learning

texture bias

structural instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Enhanced Dual-Subspace

Discrete Cosine Transform (DCT)

Low-Rank Subspace