BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the tendency of large language models (LLMs) to misapply user preferences stored in persistent memory during cross-interaction personalization, often due to neglecting social and institutional norms. To this end, we introduce BenchPreS, the first context-sensitive benchmark for evaluating preference utilization in LLMs with persistent memory. The framework features two novel metrics—Misuse Rate (MR) and Appropriate Application Rate (AAR)—to systematically assess a model’s ability to conditionally invoke preferences across diverse scenarios. Our experiments reveal that current mainstream models suffer from pervasive over-application of preferences: stronger adherence to stored preferences correlates with higher misuse rates, and existing reasoning mechanisms or prompt-based defenses fail to mitigate this issue effectively. This study underscores the necessity of modeling user preferences as context-dependent signals rather than global rules.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) increasingly store user preferences in persistent memory to support personalization across interactions. However, in third-party communication settings governed by social and institutional norms, some user preferences may be inappropriate to apply. We introduce BenchPreS, which evaluates whether memory-based user preferences are appropriately applied or suppressed across communication contexts. Using two complementary metrics, Misapplication Rate (MR) and Appropriate Application Rate (AAR), we find even frontier LLMs struggle to apply preferences in a context-sensitive manner. Models with stronger preference adherence exhibit higher rates of over-application, and neither reasoning capability nor prompt-based defenses fully resolve this issue. These results suggest current LLMs treat personalized preferences as globally enforceable rules rather than as context-dependent normative signals.

Problem

Research questions and friction points this paper is trying to address.

context-awareness

personalized preference

persistent memory

normative appropriateness

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

context-aware personalization

persistent memory

preference selectivity