MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing compositional image retrieval methods are prone to frequency bias, often overlooking rare semantic concepts, and their similarity scoring lacks robustness in the presence of hard negatives and noise. To address these limitations, this work proposes a novel architecture that enhances attention to infrequent modification semantics through a frequency–rarity balancing network. It further integrates a multimodal attention mechanism with a diffusion-model-based denoising strategy to effectively mitigate two key challenges: asymmetric semantic localization and unreliable similarity estimation. Evaluated on two standard benchmarks, the proposed method significantly outperforms current state-of-the-art approaches, demonstrating its superiority in multimodal fusion and retrieval performance.
📝 Abstract
Composed Image Retrieval (CIR) uses a reference image and a modification text as a query to retrieve a target image satisfying the requirement of ``modifying the reference image according to the text instructions''. However, existing CIR methods face two limitations: (1) frequency bias leading to ``Rare Sample Neglect'', and (2) susceptibility of similarity scores to interference from hard negative samples and noise. To address these limitations, we confront two key challenges: asymmetric rare semantic localization and robust similarity estimation under hard negative samples. To solve these challenges, we propose the Modification frEquentation-rarity baLance neTwork MELT. MELT assigns increased attention to rare modification semantics in multimodal contexts while applying diffusion-based denoising to hard negative samples with high similarity scores, enhancing multimodal fusion and matching. Extensive experiments on two CIR benchmarks validate the superior performance of MELT. Codes are available at https://github.com/luckylittlezhi/MELT.
Problem

Research questions and friction points this paper is trying to address.

Composed Image Retrieval
frequency bias
rare sample neglect
hard negative samples
similarity estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Composed Image Retrieval
frequency-rarity balance
rare semantic localization
diffusion-based denoising
hard negative samples
🔎 Similar Papers
No similar papers found.
G
Guozhi Qiu
School of Software, Shandong University
Z
Zhiwei Chen
School of Software, Shandong University
Z
Zixu Li
School of Software, Shandong University
Q
Qinlei Huang
School of Software, Shandong University
Z
Zhiheng Fu
School of Software, Shandong University
Xuemeng Song
Xuemeng Song
City University of Hong Kong
Information RetrievalMultimedia Analysis
Yupeng Hu
Yupeng Hu
Shandong University
Multimedia Information RetrievalData Mining and Knowledge Discovery