Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing RAG systems employ uniform query expansion strategies, overlooking user-specific expression styles, preferences, and corpus structural heterogeneity—leading to loss of personalized intent and failure in semantic anchoring. To address this, we propose PBR (Personalized Before Retrieval), a pre-retrieval personalization framework that integrates user-specific signals prior to retrieval. Specifically, PBR employs style-aligned pseudo-relevance feedback (P-PRF) to preserve user intent and leverages user-specific corpus graph structure anchoring (P-Anchor) to enable adaptive semantic space alignment. This constitutes the first query expansion paradigm jointly modeling user stylistic characteristics and corpus structural heterogeneity, effectively narrowing the semantic gap in personalized RAG. Evaluated on benchmarks including PersonaBench, PBR achieves up to 10% absolute improvement in retrieval accuracy over strong baselines.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures-varying in topical focus and lexical organization-which hinders the effective anchoring of expanded queries within the user's corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems. Our code is available at https://github.com/Zhang-Yingyi/PBR-code.

Problem

Research questions and friction points this paper is trying to address.

Addresses uniform query expansion ignoring user-specific semantics

Solves diverse user expression styles obscuring personalized intent

Overcomes heterogeneous semantic structures in user corpora

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates stylistically aligned pseudo feedback using user history

Performs graph-based structure alignment over user corpora

Incorporates user-specific signals into query expansion before retrieval

🔎 Similar Papers

Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement