Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement

πŸ“… 2025-09-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Personalized product search (PPS) faces two key challenges: (1) existing LLM-based methods rely solely on textual modalities, neglecting critical multimodal signals such as images; and (2) user interaction histories contain substantial redundancy and noise, which mislead models and inflate computational overhead. To address these, we propose HMPPSβ€”the first framework to systematically integrate multimodal large language models (MLLMs) into PPS. HMPPS employs a query-aware visual-textual summarization module for joint cross-modal understanding and introduces a two-stage training paradigm: first filtering noisy user history via multimodal representation learning, then refining preference modeling. This design effectively suppresses input noise and enhances relevance estimation accuracy. HMPPS achieves state-of-the-art performance across four public benchmarks and demonstrates statistically significant improvements in A/B tests on a production-scale online system serving over 100 million users.

Technology Category

Application Category

πŸ“ Abstract
Personalized product search (PPS) aims to retrieve products relevant to the given query considering user preferences within their purchase histories. Since large language models (LLM) exhibit impressive potential in content understanding and reasoning, current methods explore to leverage LLM to comprehend the complicated relationships among user, query and product to improve the search performance of PPS. Despite the progress, LLM-based PPS solutions merely take textual contents into consideration, neglecting multimodal contents which play a critical role for product search. Motivated by this, we propose a novel framework, HMPPS, for extbf{H}arnessing extbf{M}ultimodal large language models (MLLM) to deal with extbf{P}ersonalized extbf{P}roduct extbf{S}earch based on multimodal contents. Nevertheless, the redundancy and noise in PPS input stand for a great challenge to apply MLLM for PPS, which not only misleads MLLM to generate inaccurate search results but also increases the computation expense of MLLM. To deal with this problem, we additionally design two query-aware refinement modules for HMPPS: 1) a perspective-guided summarization module that generates refined product descriptions around core perspectives relevant to search query, reducing noise and redundancy within textual contents; and 2) a two-stage training paradigm that introduces search query for user history filtering based on multimodal representations, capturing precise user preferences and decreasing the inference cost. Extensive experiments are conducted on four public datasets to demonstrate the effectiveness of HMPPS. Furthermore, HMPPS is deployed on an online search system with billion-level daily active users and achieves an evident gain in A/B testing.
Problem

Research questions and friction points this paper is trying to address.

Personalized product search using multimodal LLMs faces content redundancy and noise issues
Existing LLM-based methods ignore multimodal contents crucial for product search performance
Redundant multimodal inputs mislead MLLMs and increase computational expenses significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal LLMs for personalized product search
Implements query-aware summarization to reduce noise
Applies two-stage training with query filtering
πŸ”Ž Similar Papers
No similar papers found.
B
Beibei Zhang
State Key Laboratory for Novel Software Technology, Nanjing University
Y
Yanan Lu
Tencent
Ruobing Xie
Ruobing Xie
Tencent
Large Language ModelRecommender SystemNatural Language Processing
Zongyi Li
Zongyi Li
MIT
Machine learningScientific computingNeural operator
S
Siyuan Xing
Tencent
Tongwei Ren
Tongwei Ren
Nanjing University
multimedia computing
F
Fen Lin
Tencent