🤖 AI Summary
This work addresses the challenge of enabling effective human intervention in the planning processes of large language models (LLMs) within complex multi-agent systems, where existing approaches offer only outcome-level supervision and lack visibility into or control over intermediate reasoning. The paper presents the first systematic formulation of an interaction design space for human–LLM collaborative planning, structured along three dimensions—semantic vs. structural, global vs. local, and high-level vs. low-level—to support process-level supervision. A prototype system, AMBIPOM, is developed to instantiate diverse interaction modalities. User studies reveal a strong preference for mixed interaction strategies, while benchmark experiments quantitatively evaluate LLM plan revision efficacy under different editing strategies, uncovering a trade-off among effort, control, and risk. These findings provide both theoretical grounding and empirical evidence for transparent and controllable human–AI co-planning.
📝 Abstract
In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a design space for human-LLM co-planning interactions along three axes: mode (semantic vs. structural), scope (global vs. targeted), and level (low vs. high-level edits). We realize it in AMBIPOM, a prototype supporting process-level supervision through both semantic and structural interactions. Through a user study, we characterize how users navigate this space, revealing hybrid workflows and effort-control-risk trade-offs; through a controlled benchmark, we analyze how LLMs revise plans under varying scope and revision strategies. Our findings yield design insights for more transparent, controllable, and effective human-AI co-planning. We release code and data at https://github.com/megagonlabs/ambipom.