GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Prior work has not systematically explored large language models’ (LLMs) potential as active interviewers in autobiographical interviews, particularly lacking formal modeling and evaluation of goal-directedness, contextual coherence, and empathetic interaction. Method: We propose GuideLLM—the first LLM-based guided dialogue framework tailored for autobiographical interviews—formally defining three core dimensions of guided dialogue: goal navigation, context management, and empathetic interaction. We construct a multi-dimensional automated evaluation environment and a user-agent benchmark grounded in real autobiographical data. Contribution/Results: Through comparative experiments across multiple models, LLM-as-a-judge evaluation, and a 45-participant human study, GuideLLM significantly outperforms six state-of-the-art baselines—including GPT-4o and Llama-3-70b—in both interview quality and autobiographical narrative generation, establishing a new paradigm for LLM-driven guided dialogue.

Technology Category

Application Category

📝 Abstract

Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations-where LLMs direct the discourse and steer the conversation's objectives-remains under-explored. In this study, we first characterize LLM-guided conversation into three fundamental components: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GuideLLM as an installation. We then implement an interviewing environment for the evaluation of LLM-guided conversation. Specifically, various topics are involved in this environment for comprehensive interviewing evaluation, resulting in around 1.4k turns of utterances, 184k tokens, and over 200 events mentioned during the interviewing for each chatbot evaluation. We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality. For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors. We further conduct a human-involved experiment by employing 45 human participants to chat with GuideLLM and baselines. We then collect human feedback, preferences, and ratings regarding the qualities of conversation and autobiography. Experimental results indicate that GuideLLM significantly outperforms baseline LLMs in automatic evaluation and achieves consistent leading performances in human ratings.

Problem

Research questions and friction points this paper is trying to address.

Exploring LLM-guided conversation potential

Developing GuideLLM for autobiography interviewing

Evaluating GuideLLM against state-of-the-art LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided conversation system

Autobiography interviewing environment

LLM-as-a-judge evaluation

🔎 Similar Papers

ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language Models