🤖 AI Summary
Existing user simulators predominantly assume cooperative interaction, limiting their utility for training and evaluating tool-augmented agents in realistic non-cooperative settings. This work introduces the first non-cooperative user simulator framework specifically designed for tool agents. It systematically models four representative disruptive behaviors—requesting unavailable tools, topic deviation, expressing impatience, and issuing incomplete utterances—while preserving task solvability to ensure natural, controllable, and challenging non-cooperative interactions. Leveraging MultiWOZ and τ-bench, the framework integrates rule-based logic with generative models to achieve precise intent transmission and targeted behavioral injection. Experiments demonstrate that state-of-the-art tool agents exhibit significant performance degradation under this simulator: hallucination rates increase markedly, and dialogue failures become frequent. These results validate the simulator’s effectiveness in diagnosing critical robustness deficiencies in current tool agents.
📝 Abstract
Non-Collaborative User Simulators for Tool Agents Download PDF Jeonghoon Shim, Woojung Song, Cheyon Jin, Seungwon KooK, Yohan Jo 19 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference SubmissionConference, AuthorsRevisionsCC BY 4.0 Keywords: Tool Agent, User Simulator, Non-collaborative User, Dialogue Simulation TL;DR: A non-collaborative user simulation method for tool agent. Abstract: Tool agents interact with users through multi-turn dialogues to accomplish various tasks. Recent studies have adopted user simulation methods to develop these agents in multi-turn settings. However, existing user simulators tend to be agent-friendly, exhibiting only cooperative behaviors, which fails to train and test agents against non-collaborative users in the real world. To address this, we propose a novel user simulator architecture that simulates four categories of non-collaborative behaviors: requesting unavailable services, digressing into tangential conversations, expressing impatience, and providing incomplete utterances. Our user simulator can simulate challenging and natural non-collaborative behaviors while reliably delivering all intents and information necessary to accomplish the task. Our experiments on MultiWOZ and $τ$-bench reveal significant performance degradation in state-of-the-art tool agents when encountering non-collaborative users. We provide detailed analyses of agents' weaknesses under each non-collaborative condition, such as escalated hallucinations and dialogue breakdowns. Ultimately, we contribute an easily extensible user simulation framework to help the research community develop tool agents and preemptively diagnose them under challenging real-world conditions within their own services.