Evaluating Human Trust in LLM-Based Planners: A Preliminary Study

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study investigates the mechanisms underlying human trust in large language model (LLM)-based planners, contrasting them with classical PDDL planners across subjective trust and objective performance dimensions. Method: A controlled user study was conducted, integrating validated trust scales, behavioral logging, and multidimensional performance metrics—including plan correctness, explainability, and iterative refinement. Contribution/Results: The study identifies *correctness* as the primary driver of trust formation. While LLM-generated explanations improved planning evaluation accuracy by 19%, they did not significantly increase trust scores. In contrast, introducing an iterative planning mechanism increased trust by 27%, independent of performance gains—demonstrating that trust can be enhanced through interaction design rather than correctness alone. These findings provide empirical grounding and novel design principles for developing trustworthy AI planning systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used for planning tasks, offering unique capabilities not found in classical planners such as generating explanations and iterative refinement. However, trust--a critical factor in the adoption of planning systems--remains underexplored in the context of LLM-based planning tasks. This study bridges this gap by comparing human trust in LLM-based planners with classical planners through a user study in a Planning Domain Definition Language (PDDL) domain. Combining subjective measures, such as trust questionnaires, with objective metrics like evaluation accuracy, our findings reveal that correctness is the primary driver of trust and performance. Explanations provided by the LLM improved evaluation accuracy but had limited impact on trust, while plan refinement showed potential for increasing trust without significantly enhancing evaluation accuracy.

Problem

Research questions and friction points this paper is trying to address.

Evaluates human trust in LLM-based planners

Compares trust between LLM and classical planners

Assesses impact of explanations and plan refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based planning evaluation

Trust comparison with classical planners

Explanation and refinement impact analysis

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions