APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality, human-annotated multi-turn dialogue data for training interactive AI agents is scarce and prohibitively expensive to construct. Method: We propose a two-stage synthetic framework—“blueprint verification → trajectory generation”: first, an LLM-based jury with iterative feedback generates verifiable task blueprints; second, human behavioral modeling simulates full multi-turn interaction trajectories. The method integrates multi-agent evaluation, instruction tuning, and multi-turn reinforcement learning for alignment. Contribution/Results: We release the xLAM-2-fc-r model family (1B–70B), which outperforms GPT-4o and Claude 3.5 across τ-bench and BFCL benchmarks—remarkably, smaller models surpass larger ones. The approach significantly improves cross-turn consistency and stability. We introduce the first blueprint-driven, controllable trajectory synthesis paradigm, enabling low-cost, high-fidelity construction of multi-turn dialogue data.

Technology Category

Application Category

📝 Abstract
Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $ au$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io
Problem

Research questions and friction points this paper is trying to address.

Generates high-quality multi-turn agent interaction data
Simulates human-agent dynamics for realistic training
Improves model performance in multi-turn interaction benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase framework for multi-turn data generation
Agentic pipeline with LLM reviewers and feedback
Simulated human-agent interplay for interaction trajectories
🔎 Similar Papers
No similar papers found.