LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

πŸ“… 2026-01-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing fashion image retrieval benchmarks fail to capture the dynamic nature, fine-grained requirements, and evolving trends of real-world e-commerce scenarios, lacking both timeliness and the capacity for continuous evolution. To address this gap, this work proposes LookBenchβ€”a dynamic, evolvable benchmark tailored to authentic e-commerce settings, encompassing both individual items and outfit-level representations. LookBench integrates real product images with AI-generated visuals and incorporates timestamped data alongside a periodic update mechanism to enable contamination-aware evaluation. Leveraging a fine-grained attribute schema, cross-modal retrieval techniques, and a standardized evaluation protocol, the proposed model substantially outperforms baseline methods on LookBench (Recall@1 < 60%) and achieves state-of-the-art performance on Fashion200K. The dataset, code, and leaderboard are publicly released.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we present LookBench (We use the term"look"to reflect retrieval that mirrors how people shop -- finding the exact item, a close substitute, or a visually consistent alternative.), a live, holistic and challenging benchmark for fashion image retrieval in real e-commerce settings. LookBench includes both recent product images sourced from live websites and AI-generated fashion images, reflecting contemporary trends and use cases. Each test sample is time-stamped and we intend to update the benchmark periodically, enabling contamination-aware evaluation aligned with declared training cutoffs. Grounded in our fine-grained attribute taxonomy, LookBench covers single-item and outfit-level retrieval across. Our experiments reveal that LookBench poses a significant challenge on strong baselines, with many models achieving below $60\%$ Recall@1. Our proprietary model achieves the best performance on LookBench, and we release an open-source counterpart that ranks second, with both models attaining state-of-the-art results on legacy Fashion200K evaluations. LookBench is designed to be updated semi-annually with new test samples and progressively harder task variants, providing a durable measure of progress. We publicly release our leaderboard, dataset, evaluation code, and trained models.
Problem

Research questions and friction points this paper is trying to address.

fashion image retrieval
benchmark
e-commerce
AI-generated images
live evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

fashion image retrieval
live benchmark
time-stamped evaluation
AI-generated fashion images
fine-grained attribute taxonomy
πŸ”Ž Similar Papers
No similar papers found.
C
Chao Gao
Gensmo.ai
Siqiao Xue
Siqiao Xue
Ant Group, Alibaba
Machine learning
Y
Yimin Peng
Gensmo.ai
J
Jiwen Fu
Gensmo.ai
Tingyi Gu
Tingyi Gu
Associate Professor, University of Delaware
Semiconductorssilicon photonicsmetasurface
S
Shanshan Li
Gensmo.ai
F
Fan Zhou
Gensmo.ai