A Multihead Continual Learning Framework for Fine-Grained Fashion Image Retrieval with Contrastive Learning and Exponential Moving Average Distillation

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the high computational cost of full retraining and the lack of class-incremental learning support in fine-grained fashion image retrieval under dynamically expanding attribute scenarios. To this end, we propose the MCL-FIR framework, which introduces class-incremental learning to this task for the first time. MCL-FIR employs a multi-head architecture to accommodate continuously emerging classes, reformulates the conventional triplet loss into a two-sample contrastive learning objective based on InfoNCE, and integrates exponential moving average (EMA) distillation for efficient knowledge transfer. Experimental results demonstrate that MCL-FIR significantly outperforms existing class-incremental learning baselines across four benchmark datasets, achieving accuracy comparable to static full retraining with only approximately 30% of the training cost, thereby striking an effective balance between efficiency and performance.

Technology Category

Application Category

📝 Abstract

Most fine-grained fashion image retrieval (FIR) methods assume a static setting, requiring full retraining when new attributes appear, which is costly and impractical for dynamic scenarios. Although pretrained models support zero-shot inference, their accuracy drops without supervision, and no prior work explores class-incremental learning (CIL) for fine-grained FIR. We propose a multihead continual learning framework for fine-grained fashion image retrieval with contrastive learning and exponential moving average (EMA) distillation (MCL-FIR). MCL-FIR adopts a multi-head design to accommodate evolving classes across increments, reformulates triplet inputs into doublets with InfoNCE for simpler and more effective training, and employs EMA distillation for efficient knowledge transfer. Experiments across four datasets demonstrate that, beyond its scalability, MCL-FIR achieves a strong balance between efficiency and accuracy. It significantly outperforms CIL baselines under similar training cost, and compared with static methods, it delivers comparable performance while using only about 30% of the training cost. The source code is publicly available in https://github.com/Dr-LingXiao/MCL-FIR.

Problem

Research questions and friction points this paper is trying to address.

fine-grained fashion image retrieval

class-incremental learning

dynamic scenarios

scalability

training cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning

Fine-Grained Fashion Image Retrieval

Contrastive Learning