Simulating Environments with Reasoning Models for Agent Training

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Current LLM-based agents suffer from poor generalization in complex multi-tool environments and rely heavily on real API deployments, leading to high training costs and limited scalability. To address this, we propose Simia, the first framework that introduces an LLM-based environment feedback simulation paradigm for end-to-end agent training without requiring access to real environments. Simia comprises two complementary training pathways: Simia-SFT (Supervised Fine-Tuning with simulated feedback) and Simia-RL (Reinforcement Learning with simulated rewards). It leverages LLMs to synthesize agent-environment interaction trajectories, generate realistic environment feedback, augment seed data, and model environment-agnostic reasoning. Evaluated on multiple benchmarks—including τ²-Bench—Simia significantly boosts the performance of open-source models, surpassing GPT-4o and approaching the capability of o4-mini. This work establishes a novel, scalable, and environment-decoupled paradigm for training LLM agents.

Technology Category

Application Category

📝 Abstract

LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $τ^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.

Problem

Research questions and friction points this paper is trying to address.

Simulating realistic environments without testbed data

Generating diverse training trajectories from small seed sets

Enabling reinforcement learning through LLM-simulated feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs simulate realistic environment feedback without APIs

Simia-SFT synthesizes SFT data from small seed sets

Simia-RL enables RL training via LLM-simulated feedback

🔎 Similar Papers

No similar papers found.