EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning in Vision Transformers

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes EfficientFSL, a parameter-efficient framework for few-shot learning with large vision transformers. While full fine-tuning of such models yields strong performance, it incurs prohibitive computational and memory costs, hindering deployment in resource-constrained settings. EfficientFSL addresses this by updating only a minimal set of parameters: a lightweight Forward Block generates task-specific queries, a Combine Block aggregates multi-layer intermediate features, and a Support-Query Attention Block mitigates distributional shifts between support and query sets. Evaluated across four in-domain and six cross-domain few-shot benchmarks, the method achieves state-of-the-art performance while substantially reducing both computational overhead and memory consumption, offering an effective balance between efficiency and practicality.

Technology Category

Application Category

📝 Abstract
Large models such as Vision Transformers (ViTs) have demonstrated remarkable superiority over smaller architectures like ResNet in few-shot classification, owing to their powerful representational capacity. However, fine-tuning such large models demands extensive GPU memory and prolonged training time, making them impractical for many real-world low-resource scenarios. To bridge this gap, we propose EfficientFSL, a query-only fine-tuning framework tailored specifically for few-shot classification with ViT, which achieves competitive performance while significantly reducing computational overhead. EfficientFSL fully leverages the knowledge embedded in the pre-trained model and its strong comprehension ability, achieving high classification accuracy with an extremely small number of tunable parameters. Specifically, we introduce a lightweight trainable Forward Block to synthesize task-specific queries that extract informative features from the intermediate representations of the pre-trained model in a query-only manner. We further propose a Combine Block to fuse multi-layer outputs, enhancing the depth and robustness of feature representations. Finally, a Support-Query Attention Block mitigates distribution shift by adjusting prototypes to align with the query set distribution. With minimal trainable parameters, EfficientFSL achieves state-of-the-art performance on four in-domain few-shot datasets and six cross-domain datasets, demonstrating its effectiveness in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Few-Shot Classification
Vision Transformers
Model Efficiency
Low-Resource Scenarios
Computational Overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-Only Tuning
Vision Transformers
Few-Shot Learning
Parameter-Efficient Fine-Tuning
Support-Query Alignment
🔎 Similar Papers
No similar papers found.
W
Wenwen Liao
College of Intelligent Robotics and Advance Manufacturing, Fudan University, Shanghai China
H
Hang Ruan
College of Intelligent Robotics and Advance Manufacturing, Fudan University, Shanghai China
Jianbo Yu
Jianbo Yu
Professor of School of Mechanical Engineering, Tongji University
Prognostics and Health ManagementCondition-Based MonitoringQuality ControlFault DiagnosisIndustrial Engineering
B
Bing Song
School of Information Science and Engineering, East China University of Science and Technology, Shanghai China
Y
Yuansong Wang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen China
X
Xiaofeng Yang
School of Microelectronics, Fudan University, Shanghai China