๐ค AI Summary
This study addresses the coordination challenges users encounter when routinely interacting with multiple multimodal large language models (MLLMs), including prompt adaptation, trust calibration, and fragmented conversation histories. Through a multi-day diary study and semi-structured interviews with ten participants, combined with contextual analysis and thematic coding, we uncover how users dynamically establish hierarchical relationships among models, develop task-driven switching strategies, and continuously negotiate trade-offs among effort, latency, and output credibility. Our findings provide the first empirical account of multi-agent MLLM coordination in humanโcomputer interaction, offering critical behavioral insights and actionable design implications for tools that support effective collaboration across multiple MLLMs.
๐ Abstract
People increasingly use multiple Multimodal Large Language Models (MLLMs) concurrently, selecting each based on its perceived strengths. This cross-platform practice creates coordination challenges: adapting prompts to different interfaces, calibrating trust against inconsistent behaviors, and navigating separate conversation histories. Prior HCI research focused on single-agent interactions, leaving multi-MLLM orchestration underexplored. Through a diary study and semi-structured interviews (N=10), we examine how individuals organize work across competing AI systems. Our findings reveal that users construct primary and secondary hierarchies among models that shift over usage context. They also develop personalized switching patterns triggered by task aggregation to adjust effort and latency, and output credibility. These insights inform future tool design opportunities, supporting users to coordinate multi-MLLM workflows.