π€ AI Summary
This work addresses the limitations of existing datasets and generation methods, which rely on static tool sets and struggle to simulate high-complexity, multi-turn human-agent collaboration in open-ended scenarios. To overcome this, the authors propose a user-oriented multi-turn dialogue generation framework that decouples task execution from user simulation, dynamically modeling human usersβ progressive requests and turn-by-turn feedback. Built upon a large reasoning model (LRM), the framework enables plug-and-play generation, arbitrary state initialization, and execution of multiple tasks within a single dialogue trajectory, substantially enhancing data density and realism. The resulting dialogue dataset significantly outperforms those generated by conventional task-oriented approaches in terms of conversation length, task complexity, and interaction authenticity.
π Abstract
The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in"solely task-solving"trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.