🤖 AI Summary
Scaling human team behavioral data for human-AI collaborative decision-making remains challenging due to the difficulty of collecting diverse, high-quality human interaction traces. Method: This paper introduces the first algorithmic prompt-generation framework that integrates Quality-Diversity (QD) optimization with large language model (LLM) agents—requiring neither handcrafted prompts nor large-scale user studies. It automatically discovers prompt strategies that elicit multidimensional, human-like communication and coordination behaviors from LLMs in multi-step collaborative settings, synthesizing a broad spectrum of team behavioral patterns. Contribution/Results: Evaluated in a 54-participant user study, the generated behaviors accurately reproduce key human collaboration trends and uncover novel coordination patterns otherwise obscured by data sparsity. The approach significantly outperforms baseline methods in both behavioral diversity and fidelity to human behavior.
📝 Abstract
Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.