🤖 AI Summary
Current LLM-based agents suffer from poor generalization in complex multi-tool environments and rely heavily on real API deployments, leading to high training costs and limited scalability. To address this, we propose Simia, the first framework that introduces an LLM-based environment feedback simulation paradigm for end-to-end agent training without requiring access to real environments. Simia comprises two complementary training pathways: Simia-SFT (Supervised Fine-Tuning with simulated feedback) and Simia-RL (Reinforcement Learning with simulated rewards). It leverages LLMs to synthesize agent-environment interaction trajectories, generate realistic environment feedback, augment seed data, and model environment-agnostic reasoning. Evaluated on multiple benchmarks—including τ²-Bench—Simia significantly boosts the performance of open-source models, surpassing GPT-4o and approaching the capability of o4-mini. This work establishes a novel, scalable, and environment-decoupled paradigm for training LLM agents.
📝 Abstract
LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $τ^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.