HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This work addresses the noisy triplet correspondence (NTC) problem in compositional image retrieval, which arises from the high annotation cost and subjectivity of triplet labeling. To tackle this challenge, the authors propose a robust progressive learning framework that introduces, for the first time, a sample cleanliness assessment mechanism based on mutual information transfer rate. The framework employs a dual-model collaborative strategy with dual consistency constraints between historical and current models, effectively simulating the human habit formation process to mitigate semantic bias and noise interference. Extensive experiments on two standard benchmarks demonstrate that the proposed method consistently outperforms existing approaches across various noise ratios, achieving superior robustness and retrieval performance.

Technology Category

Application Category

📝 Abstract
Composed Image Retrieval (CIR) is a flexible image retrieval paradigm that enables users to accurately locate the target image through a multimodal query composed of a reference image and modification text. Although this task has demonstrated promising applications in personalized search and recommendation systems, it encounters a severe challenge in practical scenarios known as the Noise Triplet Correspondence (NTC) problem. This issue primarily arises from the high cost and subjectivity involved in annotating triplet data. To address this problem, we identify two central challenges: the precise estimation of composed semantic discrepancy and the insufficient progressive adaptation to modification discrepancy. To tackle these challenges, we propose a cHrono-synergiA roBust progressIve learning framework for composed image reTrieval (HABIT), which consists of two core modules. First, the Mutual Knowledge Estimation Module quantifies sample cleanliness by calculating the Transition Rate of mutual information between the composed feature and the target image, thereby effectively identifying clean samples that align with the intended modification semantics. Second, the Dual-consistency Progressive Learning Module introduces a collaborative mechanism between the historical and current models, simulating human habit formation to retain good habits and calibrate bad habits, ultimately enabling robust learning under the presence of NTC. Extensive experiments conducted on two standard CIR datasets demonstrate that HABIT significantly outperforms most methods under various noise ratios, exhibiting superior robustness and retrieval performance. Codes are available at https://github.com/Lee-zixu/HABIT
Problem

Research questions and friction points this paper is trying to address.

Composed Image Retrieval
Noise Triplet Correspondence
Multimodal Query
Semantic Discrepancy
Robust Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Composed Image Retrieval
Noise Triplet Correspondence
Mutual Knowledge Estimation
Progressive Learning
Robust Learning
🔎 Similar Papers
No similar papers found.