Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A lack of open-source dialogue benchmarks tailored for personalized AI assistant research hinders systematic evaluation and development of large language models’ (LLMs) personalization capabilities. Method: We introduce HiCUPID—the first open-source, multi-turn personalized dialogue dataset supporting user profiling and long-term memory awareness—and develop a learnable automatic evaluation model based on fine-tuned Llama-3.2. We further propose an end-to-end personalized assistant evaluation framework integrating user profile injection, memory consistency verification, and alignment with human preferences. Contribution/Results: HiCUPID significantly improves the efficiency, reproducibility, and interpretability of personalized response evaluation. It establishes a unified, scalable benchmark platform for LLM personalization research, enabling rigorous, standardized assessment of adaptive, context-aware, and user-consistent behavior in conversational agents.

Technology Category

Application Category

📝 Abstract
Personalized AI assistants, a hallmark of the human-like capabilities of Large Language Models (LLMs), are a challenging application that intertwines multiple problems in LLM research. Despite the growing interest in the development of personalized assistants, the lack of an open-source conversational dataset tailored for personalization remains a significant obstacle for researchers in the field. To address this research gap, we introduce HiCUPID, a new benchmark to probe and unleash the potential of LLMs to deliver personalized responses. Alongside a conversational dataset, HiCUPID provides a Llama-3.2-based automated evaluation model whose assessment closely mirrors human preferences. We release our dataset, evaluation model, and code at https://github.com/12kimih/HiCUPID.
Problem

Research questions and friction points this paper is trying to address.

Lack of open-source dataset for personalized AI assistants
Need for benchmark to evaluate LLM personalization capabilities
Absence of automated evaluation model matching human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HiCUPID benchmark for personalized LLMs
Provides open-source conversational dataset for personalization
Offers Llama-3.2-based automated evaluation model
🔎 Similar Papers