AgentSim: A Platform for Verifiable Agent-Trace Simulation

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This work addresses the scarcity of verifiable, document-grounded reasoning traces in existing datasets, which hinders the development of trustworthy large language agents. To bridge this gap, we introduce AgentSim, an open-source platform that simulates retrieval-augmented generation (RAG) agents over arbitrary document collections to produce evidence-based, step-by-step reasoning trajectories. Our framework incorporates Corpus-Aware Seeding to enhance diversity and an Active Validation mechanism that combines multi-model consistency checks with human-in-the-loop annotation to ensure high-quality outputs. We present the first fully traceable Agent-Trace Corpus (ATC), comprising 103,000 reasoning steps, achieving 100% document traceability for all answers across three information retrieval benchmarks. Furthermore, our analysis reveals systematic differences in retrieval behaviors among state-of-the-art models.
📝 Abstract
Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrieval and synthesis steps of a RAG workflow. We introduce AgentSim, an open-source platform for simulating RAG agents. It generates verifiable, stepwise traces of agent reasoning over any document collection. AgentSim uses a policy to ensure the agent widely explores the document set. It combines a multi-model validation pipeline with an active human-in-the-loop process. This approach focuses human effort on difficult steps where models disagree. Using AgentSim, we construct and release the Agent-Trace Corpus (ATC), a large collection of grounded reasoning trajectories spanning three established IR benchmarks. We make three contributions: (1) the AgentSim platform with two mechanisms, Corpus-Aware Seeding and Active Validation, that improve trace diversity and quality; (2) the Agent-Trace Corpus (ATC), over 103,000 verifiable reasoning steps spanning three IR benchmarks, with 100% grounding rate on substantive answers; and (3) a comparative behavioral analysis revealing systematic differences in how state-of-the-art models approach information seeking. Platform, toolkit, and corpus are publicly available.
Problem

Research questions and friction points this paper is trying to address.

agentic LLMs
grounded reasoning
RAG workflow
verifiable traces
retrieval and synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

AgentSim
RAG agent simulation
verifiable reasoning traces
Corpus-Aware Seeding
Active Validation
🔎 Similar Papers
2024-03-04Proceedings of the 17th International Conference on Agents and Artificial IntelligenceCitations: 3