Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Personalized product search (PPS) faces two key challenges: (1) existing LLM-based methods rely solely on textual modalities, neglecting critical multimodal signals such as images; and (2) user interaction histories contain substantial redundancy and noise, which mislead models and inflate computational overhead. To address these, we propose HMPPS—the first framework to systematically integrate multimodal large language models (MLLMs) into PPS. HMPPS employs a query-aware visual-textual summarization module for joint cross-modal understanding and introduces a two-stage training paradigm: first filtering noisy user history via multimodal representation learning, then refining preference modeling. This design effectively suppresses input noise and enhances relevance estimation accuracy. HMPPS achieves state-of-the-art performance across four public benchmarks and demonstrates statistically significant improvements in A/B tests on a production-scale online system serving over 100 million users.

Technology Category

Application Category

📝 Abstract

Personalized product search (PPS) aims to retrieve products relevant to the given query considering user preferences within their purchase histories. Since large language models (LLM) exhibit impressive potential in content understanding and reasoning, current methods explore to leverage LLM to comprehend the complicated relationships among user, query and product to improve the search performance of PPS. Despite the progress, LLM-based PPS solutions merely take textual contents into consideration, neglecting multimodal contents which play a critical role for product search. Motivated by this, we propose a novel framework, HMPPS, for extbf{H}arnessing extbf{M}ultimodal large language models (MLLM) to deal with extbf{P}ersonalized extbf{P}roduct extbf{S}earch based on multimodal contents. Nevertheless, the redundancy and noise in PPS input stand for a great challenge to apply MLLM for PPS, which not only misleads MLLM to generate inaccurate search results but also increases the computation expense of MLLM. To deal with this problem, we additionally design two query-aware refinement modules for HMPPS: 1) a perspective-guided summarization module that generates refined product descriptions around core perspectives relevant to search query, reducing noise and redundancy within textual contents; and 2) a two-stage training paradigm that introduces search query for user history filtering based on multimodal representations, capturing precise user preferences and decreasing the inference cost. Extensive experiments are conducted on four public datasets to demonstrate the effectiveness of HMPPS. Furthermore, HMPPS is deployed on an online search system with billion-level daily active users and achieves an evident gain in A/B testing.

Problem

Research questions and friction points this paper is trying to address.

Personalized product search using multimodal LLMs faces content redundancy and noise issues

Existing LLM-based methods ignore multimodal contents crucial for product search performance

Redundant multimodal inputs mislead MLLMs and increase computational expenses significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal LLMs for personalized product search

Implements query-aware summarization to reduce noise

Applies two-stage training with query filtering

🔎 Similar Papers

No similar papers found.