PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

πŸ“… 2025-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current task-oriented dialogue (TOD) systems generate generic responses lacking personalization with respect to user attributes such as age and emotional state. To address this, we introduce the first personalized TOD dataset incorporating user portrait images as persona representations, and propose Pictorβ€”a novel multimodal framework that pioneers the use of user portrait images as explicit persona inputs. Pictor integrates dialogue-policy-guided multimodal prompting with external knowledge retrieval to mitigate hallucination and enhance cross-domain generalization, while jointly optimizing image understanding and text generation. Human evaluation demonstrates significant improvements in interaction naturalness and user engagement. Moreover, the model exhibits strong robustness on unseen domains, achieving a 32% gain in personalized response quality over baselines. This work establishes a foundational paradigm for image-grounded persona modeling in TOD systems.

Technology Category

Application Category

πŸ“ Abstract
Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses tailored to user-specific factors such as age or emotional context. This is facilitated by first impressions, dialogue policy-guided prompting, and the use of external knowledge to reduce hallucinations. Human evaluations confirm that our dataset enhances user experience, with personalized responses contributing to a more engaging interaction. Additionally, we introduce a new NLG model, Pictor, which not only personalizes responses, but also demonstrates robust performance across unseen domains https://github.com/JihyunLee1/PicPersona.
Problem

Research questions and friction points this paper is trying to address.

Enhances task-oriented dialogue with personalized responses
Incorporates user images to tailor dialogue to personal attributes
Reduces generic responses using first impressions and external knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates user images for personalized responses
Uses dialogue policy-guided prompting and external knowledge
Introduces NLG model Pictor for robust performance
πŸ”Ž Similar Papers
No similar papers found.
Jihyun Lee
Jihyun Lee
Postech, Ph.D Candidate
NLP
Yejin Jeon
Yejin Jeon
POSTECH
Speech SynthesisSignal ProcessingNatural Language Processing
S
Seungyeon Seo
Graduate School of Artificial Intelligence, POSTECH, Republic of Korea
G
Gary Geunbae Lee
Graduate School of Artificial Intelligence, POSTECH, Republic of Korea; Department of Computer Science and Engineering, POSTECH, Republic of Korea