🤖 AI Summary
Existing datasets for multi-turn compositional image retrieval (MTCIR) suffer from inconsistent dialogue histories and are largely confined to the fashion domain. This work proposes CIRCLED, the first MTCIR dataset featuring high-quality, cross-domain dialogue consistency across nine subsets. CIRCLED is constructed by extending three established benchmarks—FashionIQ, CIRR, and CIRCO—and employs the CIReVL retrieval pipeline to automatically generate multi-turn queries. A multi-stage filtering mechanism, based on retrieval success rate, turn length, dialogue coherence, and information redundancy, ensures data quality. The resulting dataset comprises 22,608 multi-turn conversations, substantially surpassing the scale of existing Multi-turn FashionIQ. Comprehensive quantitative evaluations with multiple baseline methods demonstrate enhanced data generalizability and research scalability.
📝 Abstract
Existing Multi-Turn Composed Image Retrieval (MTCIR) datasets lack dialogue-history consistency and are restricted to the fashion domain. To address these limitations, we construct CIRCLED by extending FashionIQ, CIRR, and CIRCO. In CIRCLED, the query at each turn progressively approaches the target image. Data are generated via a CIReVL-based retrieval pipeline and curated with multiple filters on retrieval success, turn length, consistency, and information redundancy to ensure quality. In total, we collect 22,608 multi-turn sessions across nine subsets, substantially exceeding Multi-turn FashionIQ (11,505 sessions) in both scale and generality. We further apply multiple baseline methods and quantitatively assess retrieval accuracy on CIRCLED. Our work provides a practical, high-quality benchmark to facilitate future research on multi-turn CIR. The dataset and code are publicly available at https://huggingface.co/datasets/tk1441/CIRCLED and https://github.com/mti-lab/circled.