🤖 AI Summary
Existing approaches struggle to model users’ subjective traits and evolving preferences in long-term human–machine dialogues, resulting in inadequate personalization of service responses. To address this, we propose PAL-Bench—the first Chinese multi-session personalized dialogue benchmark—and construct PAL-Set, a large-scale dataset comprising real-world user logs and dialogue histories. We introduce H²Memory, a hierarchical heterogeneous memory framework that integrates retrieval-augmented generation (RAG) to dynamically model user characteristics and preserve long-term memory. High-quality training data are generated via an LLM-driven multi-step synthetic pipeline and rigorously validated by human annotators. Experiments demonstrate that H²Memory significantly improves personalized response quality on PAL-Bench and multiple external benchmarks, effectively supporting user modeling and service adaptation in extended interactions.
📝 Abstract
With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users' subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose H$^2$Memory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.