PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

📅 2025-04-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current task-oriented dialogue (TOD) systems generate generic responses lacking personalization with respect to user attributes such as age and emotional state. To address this, we introduce the first personalized TOD dataset incorporating user portrait images as persona representations, and propose Pictor—a novel multimodal framework that pioneers the use of user portrait images as explicit persona inputs. Pictor integrates dialogue-policy-guided multimodal prompting with external knowledge retrieval to mitigate hallucination and enhance cross-domain generalization, while jointly optimizing image understanding and text generation. Human evaluation demonstrates significant improvements in interaction naturalness and user engagement. Moreover, the model exhibits strong robustness on unseen domains, achieving a 32% gain in personalized response quality over baselines. This work establishes a foundational paradigm for image-grounded persona modeling in TOD systems.

Technology Category

Application Category

📝 Abstract

Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses tailored to user-specific factors such as age or emotional context. This is facilitated by first impressions, dialogue policy-guided prompting, and the use of external knowledge to reduce hallucinations. Human evaluations confirm that our dataset enhances user experience, with personalized responses contributing to a more engaging interaction. Additionally, we introduce a new NLG model, Pictor, which not only personalizes responses, but also demonstrates robust performance across unseen domains https://github.com/JihyunLee1/PicPersona.

Problem

Research questions and friction points this paper is trying to address.

Enhances task-oriented dialogue with personalized responses

Incorporates user images to tailor dialogue to personal attributes

Reduces generic responses using first impressions and external knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates user images for personalized responses

Uses dialogue policy-guided prompting and external knowledge

Introduces NLG model Pictor for robust performance

🔎 Similar Papers

No similar papers found.

Authors to Follow