Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Personalizing black-box large language models (LLMs) faces a fundamental trade-off between content fidelity and stylistic consistency, while existing context-injection methods suffer from monolithic, single-step coupling. This paper proposes a decoupled two-stage post-hoc rewriting paradigm: first, the black-box LLM generates an initial response; second, a trainable, model-agnostic external reflection module performs explicit stylistic rewriting. This module is supervised fine-tuned to learn structured rewriting policies and further optimized via reinforcement learning to enhance personalization quality. Evaluated on the LaMP benchmark, our approach significantly outperforms state-of-the-art context-injection methods, achieving superior user-style alignment without compromising semantic accuracy. Crucially, it provides the first empirical validation of explicit rewriting as a general-purpose, model-agnostic interface for LLM personalization—demonstrating both effectiveness and cross-model generalizability.

Technology Category

Application Category

📝 Abstract
The personalization of black-box large language models (LLMs) is a critical yet challenging task. Existing approaches predominantly rely on context injection, where user history is embedded into the prompt to directly guide the generation process. However, this single-step paradigm imposes a dual burden on the model: generating accurate content while simultaneously aligning with user-specific styles. This often results in a trade-off that compromises output quality and limits precise control. To address this fundamental tension, we propose Reflective Personalization Optimization (RPO), a novel framework that redefines the personalization paradigm by decoupling content generation from alignment. RPO operates in two distinct stages: first, a base model generates a high-quality, generic response; then, an external reflection module explicitly rewrites this output to align with the user's preferences. This reflection module is trained using a two-stage process. Initially, supervised fine-tuning is employed on structured rewriting trajectories to establish a core personalized reasoning policy that models the transformation from generic to user-aligned responses. Subsequently, reinforcement learning is applied to further refine and enhance the quality of the personalized outputs. Comprehensive experiments on the LaMP benchmark demonstrate that RPO, by decoupling content generation from personalization, significantly outperforms state-of-the-art baselines. These findings underscore the superiority of explicit response shaping over implicit context injection. Moreover, RPO introduces an efficient, model-agnostic personalization layer that can be seamlessly integrated with any underlying base model, paving the way for a new and effective direction in user-centric generation scenarios.
Problem

Research questions and friction points this paper is trying to address.

Decoupling content generation from user alignment in black-box LLMs
Overcoming trade-offs between output quality and personalization control
Providing model-agnostic personalization layer for user-centric generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples content generation from personalization alignment
Uses supervised fine-tuning and reinforcement learning
Provides model-agnostic post-hoc rewriting framework
🔎 Similar Papers
No similar papers found.
T
Teqi Hao
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, Shanghai, China
X
Xiaoyu Tan
Tencent Youtu Lab, Shanghai, Shanghai, China
S
Shaojie Shi
Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, Shanghai, China
Yinghui Xu
Yinghui Xu
Research Scientist/Senior Director
machine learningmachine visionoptimization
Xihe Qiu
Xihe Qiu
Associate Professor, Shanghai University of Engineering Science
AI for HealthcareVision-Language ModelsReinforcement LearningLarge Language Models