Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search

πŸ“… 2026-02-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a high-precision, real-time multimodal retrieval system to address three key challenges in industrial-scale multimodal search: coarse retrieval granularity, sensitivity to noise, and the trade-off between efficiency and performance. The core innovations include reformulating embedding learning from contrastive learning to an absolute ID recognition task, thereby constructing a globally consistent embedding space grounded in billions of semantic prototypes. Additionally, a generative re-ranking mechanism is introduced, which enhances ranking quality through chunk-wise comparative reasoning and listwise relevance calibration. Extensive offline evaluations and online A/B tests on Alibaba’s e-commerce platform demonstrate that the proposed system significantly outperforms existing approaches, achieving state-of-the-art performance and delivering substantial improvements in key business metrics.

Technology Category

Application Category

πŸ“ Abstract
In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.
Problem

Research questions and friction points this paper is trying to address.

retrieval granularity
environmental noise
efficiency-performance gap
multi-modal search
industrial retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

absolute ID-recognition
unified embedding
listwise reranker
semantic prototypes
multi-modal retrieval
πŸ”Ž Similar Papers
No similar papers found.