LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing fashion image retrieval benchmarks fail to capture the dynamic nature, fine-grained requirements, and evolving trends of real-world e-commerce scenarios, lacking both timeliness and the capacity for continuous evolution. To address this gap, this work proposes LookBench—a dynamic, evolvable benchmark tailored to authentic e-commerce settings, encompassing both individual items and outfit-level representations. LookBench integrates real product images with AI-generated visuals and incorporates timestamped data alongside a periodic update mechanism to enable contamination-aware evaluation. Leveraging a fine-grained attribute schema, cross-modal retrieval techniques, and a standardized evaluation protocol, the proposed model substantially outperforms baseline methods on LookBench (Recall@1 < 60%) and achieves state-of-the-art performance on Fashion200K. The dataset, code, and leaderboard are publicly released.

Technology Category

Application Category

📝 Abstract

In this paper, we present LookBench (We use the term"look"to reflect retrieval that mirrors how people shop -- finding the exact item, a close substitute, or a visually consistent alternative.), a live, holistic and challenging benchmark for fashion image retrieval in real e-commerce settings. LookBench includes both recent product images sourced from live websites and AI-generated fashion images, reflecting contemporary trends and use cases. Each test sample is time-stamped and we intend to update the benchmark periodically, enabling contamination-aware evaluation aligned with declared training cutoffs. Grounded in our fine-grained attribute taxonomy, LookBench covers single-item and outfit-level retrieval across. Our experiments reveal that LookBench poses a significant challenge on strong baselines, with many models achieving below $60\%$ Recall@1. Our proprietary model achieves the best performance on LookBench, and we release an open-source counterpart that ranks second, with both models attaining state-of-the-art results on legacy Fashion200K evaluations. LookBench is designed to be updated semi-annually with new test samples and progressively harder task variants, providing a durable measure of progress. We publicly release our leaderboard, dataset, evaluation code, and trained models.

Problem

Research questions and friction points this paper is trying to address.

fashion image retrieval

benchmark

e-commerce

AI-generated images

live evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

fashion image retrieval

live benchmark

time-stamped evaluation