Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the degradation of conventional product retrieval performance caused by dynamic user intent evolution in multi-turn dialogues, this paper introduces test-time scaling to multimodal generative product retrieval for the first time, proposing a collaborative optimization framework integrating a generative retriever and test-time reranking. Without explicit self-correction signals, the method dynamically aligns queries and responses via iterative fusion of dialogue history and multimodal item content, effectively mitigating query ambiguity and semantic drift. Key contributions are: (1) the first test-time reranking mechanism tailored for multimodal conversational retrieval; and (2) tight alignment between generative retrieval and a fixed item corpus. Extensive experiments on multiple benchmarks demonstrate average improvements of +14.5 points in MRR and +10.6 points in nDCG@1, significantly enhancing both accuracy and intent consistency in cross-modal conversational recommendation.

Technology Category

Application Category

📝 Abstract

The rapid evolution of e-commerce has exposed the limitations of traditional product retrieval systems in managing complex, multi-turn user interactions. Recent advances in multimodal generative retrieval -- particularly those leveraging multimodal large language models (MLLMs) as retrievers -- have shown promise. However, most existing methods are tailored to single-turn scenarios and struggle to model the evolving intent and iterative nature of multi-turn dialogues when applied naively. Concurrently, test-time scaling has emerged as a powerful paradigm for improving large language model (LLM) performance through iterative inference-time refinement. Yet, its effectiveness typically relies on two conditions: (1) a well-defined problem space (e.g., mathematical reasoning), and (2) the model's ability to self-correct -- conditions that are rarely met in conversational product search. In this setting, user queries are often ambiguous and evolving, and MLLMs alone have difficulty grounding responses in a fixed product corpus. Motivated by these challenges, we propose a novel framework that introduces test-time scaling into conversational multimodal product retrieval. Our approach builds on a generative retriever, further augmented with a test-time reranking (TTR) mechanism that improves retrieval accuracy and better aligns results with evolving user intent throughout the dialogue. Experiments across multiple benchmarks show consistent improvements, with average gains of 14.5 points in MRR and 10.6 points in nDCG@1.

Problem

Research questions and friction points this paper is trying to address.

Addressing limitations of traditional product retrieval in multi-turn dialogues

Improving multimodal generative retrieval for evolving user intent

Enhancing retrieval accuracy in conversational product search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time reranking mechanism for retrieval

Generative retriever with multimodal capabilities

Iterative refinement for evolving user intent

🔎 Similar Papers

No similar papers found.