DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing compositional image retrieval (CIR) methods struggle to simultaneously achieve global semantic alignment and fine-grained modeling of visual variations—particularly under subtle textures, local structural changes, and complex textual instructions. To address this, we propose a dual-branch collaborative architecture: a backbone branch capturing cross-modal global semantics, and a novel detail-oriented reasoning branch. The latter leverages atom-level image editing data to construct a detail prior and incorporates an adaptive multi-granularity feature fusion module for query-driven dynamic fine-grained alignment. We further introduce a detail-aware optimization strategy and contrastive learning to enhance cross-modal consistency. Our method achieves state-of-the-art performance on CIRR and FashionIQ, significantly improving retrieval accuracy for nuanced visual changes and intricate instructions. Ablation studies and cross-dataset evaluations validate the generalizability and domain-agnostic effectiveness of our detail-enhancement mechanism.

Technology Category

Application Category

📝 Abstract

Composed Image Retrieval (CIR) aims to retrieve target images from a gallery based on a reference image and modification text as a combined query. Recent approaches focus on balancing global information from two modalities and encode the query into a unified feature for retrieval. However, due to insufficient attention to fine-grained details, these coarse fusion methods often struggle with handling subtle visual alterations or intricate textual instructions. In this work, we propose DetailFusion, a novel dual-branch framework that effectively coordinates information across global and detailed granularities, thereby enabling detail-enhanced CIR. Our approach leverages atomic detail variation priors derived from an image editing dataset, supplemented by a detail-oriented optimization strategy to develop a Detail-oriented Inference Branch. Furthermore, we design an Adaptive Feature Compositor that dynamically fuses global and detailed features based on fine-grained information of each unique multimodal query. Extensive experiments and ablation analyses not only demonstrate that our method achieves state-of-the-art performance on both CIRR and FashionIQ datasets but also validate the effectiveness and cross-domain adaptability of detail enhancement for CIR.

Problem

Research questions and friction points this paper is trying to address.

Balancing global and detailed information for composed image retrieval

Handling subtle visual alterations and intricate textual instructions

Enhancing detail-aware feature fusion for multimodal queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch framework for global and detailed granularities

Detail-oriented optimization strategy with atomic priors

Adaptive Feature Compositor for dynamic feature fusion

🔎 Similar Papers

No similar papers found.