Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing personalized dialogue systems struggle to model the dynamic evolution of users’ latent states, often relying on static profiles or explicit memory and lacking proactive decision-making capabilities for future interactions. This work proposes PUMA, a novel framework that introduces the free energy principle to dialogue personalization by formulating the task as a partially observable sequential decision-making process. PUMA represents user states through latent variables and integrates action-conditioned state transitions with Bayesian belief updating, guiding dialogue policy by minimizing expected free energy. This approach shifts the paradigm from passive response retrieval to active, state-evolution-driven decision-making, unifying cognitive exploration with task-oriented objectives. Experimental results demonstrate that PUMA significantly improves long-term dialogue performance on healthcare consultation and motivational interviewing datasets, achieving superior response quality, user state estimation, and next-state prediction.

📝 Abstract

Personalized dialogue requires more than recalling explicit user histories: systems also need to infer hidden user states that evolve through interaction and shape appropriate response strategies. Existing memory- and profile-based methods primarily reuse observable user information, offering limited support for modeling user-state dynamics or selecting actions based on how they shape future user states. We propose PUMA (Prospective User-state Modeling for Action selection), a framework grounded in the Free Energy Principle (FEP) that formulates personalization as decision-making under partial observability, centered on an explicit user state model that captures latent user states and their action-conditioned dynamics. At each turn, PUMA maintains a belief over the user's hidden state, refines the user state model for observation generation and action-conditioned state transition, and selects dialogue actions by minimizing expected free energy, balancing epistemic and pragmatic objectives under a unified criterion. This formulation shifts personalization from passive memory retrieval to model-based decision-making over user evolution. We instantiate PUMA on healthcare-oriented counseling and motivational interviewing benchmarks with latent state annotations for rigorous evaluation. Experiments show that PUMA improves long-horizon dialogue outcomes while maintaining strong response quality, and a cross-dataset study demonstrates more reliable user-state estimation and next-state prediction.

Problem

Research questions and friction points this paper is trying to address.

personalized dialogue

user-state modeling

multi-turn conversation

latent user states

dialogue personalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

user-state modeling

personalized dialogue

Free Energy Principle