SDR-CIR: Semantic Debias Retrieval Framework for Training-Free Zero-Shot Composed Image Retrieval

📅 2026-02-04
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the semantic bias introduced by vague textual descriptions generated by multimodal large language models in zero-shot compositional image retrieval, which often degrades retrieval accuracy. To mitigate this issue, the authors propose a training-free semantic debiasing reranking framework that leverages a selective chain-of-thought prompting strategy to guide the model toward salient visual content. The framework incorporates a two-stage mechanism: an anchoring stage that enriches missing semantic cues and a debiasing stage that explicitly corrects description bias by suppressing redundant information through a penalty term. This approach represents the first explicit semantic debiasing mechanism tailored for zero-shot compositional image retrieval, achieving state-of-the-art performance among single-stage methods on three standard CIR benchmarks while maintaining both efficiency and accuracy.

Technology Category

Application Category

📝 Abstract
Composed Image Retrieval (CIR) aims to retrieve a target image from a query composed of a reference image and modification text. Recent training-free zero-shot methods often employ Multimodal Large Language Models (MLLMs) with Chain-of-Thought (CoT) to compose a target image description for retrieval. However, due to the fuzzy matching nature of ZS-CIR, the generated description is prone to semantic bias relative to the target image. We propose SDR-CIR, a training-free Semantic Debias Ranking method based on CoT reasoning. First, Selective CoT guides the MLLM to extract visual content relevant to the modification text during image understanding, thereby reducing visual noise at the source. We then introduce a Semantic Debias Ranking with two steps, Anchor and Debias, to mitigate semantic bias. In the Anchor step, we fuse reference image features with target description features to reinforce useful semantics and supplement omitted cues. In the Debias step, we explicitly model the visual semantic contribution of the reference image to the description and incorporate it into the similarity score as a penalty term. By supplementing omitted cues while suppressing redundancy, SDR-CIR mitigates semantic bias and improves retrieval performance. Experiments on three standard CIR benchmarks show that SDR-CIR achieves state-of-the-art results among one-stage methods while maintaining high efficiency. The code is publicly available at https://github.com/suny105/SDR-CIR.
Problem

Research questions and friction points this paper is trying to address.

Composed Image Retrieval
Zero-Shot
Semantic Bias
Multimodal Large Language Models
Image Retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Debias
Training-Free
Composed Image Retrieval
Chain-of-Thought
Multimodal Large Language Models
🔎 Similar Papers
Y
Yi Sun
Wuhan University of Technology
J
Jinyu Xu
Wuhan University of Technology
Qing Xie
Qing Xie
Wuhan University of Technology
Jiachen Li
Jiachen Li
Wuhan University of Technology
Y
Yanchun Ma
Wuhan Vocational College of Software and Engineering
Y
Yongjian Liu
Wuhan University of Technology