π€ AI Summary
This work proposes a high-precision, real-time multimodal retrieval system to address three key challenges in industrial-scale multimodal search: coarse retrieval granularity, sensitivity to noise, and the trade-off between efficiency and performance. The core innovations include reformulating embedding learning from contrastive learning to an absolute ID recognition task, thereby constructing a globally consistent embedding space grounded in billions of semantic prototypes. Additionally, a generative re-ranking mechanism is introduced, which enhances ranking quality through chunk-wise comparative reasoning and listwise relevance calibration. Extensive offline evaluations and online A/B tests on Alibabaβs e-commerce platform demonstrate that the proposed system significantly outperforms existing approaches, achieving state-of-the-art performance and delivering substantial improvements in key business metrics.
π Abstract
In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.