🤖 AI Summary
This work addresses the lack of systematic, data-driven analysis of reinforcement learning (RL) environment evolution, which has obscured the developmental trajectory from physics-based simulation toward general-purpose agents. To bridge this gap, we propose the first data-driven, multidimensional taxonomy framework that integrates automated semantic analysis, statistical modeling, and large-scale literature mining to systematically examine over 2,000 core papers. Our analysis reveals an ongoing paradigm shift in RL environments, bifurcating into two distinct ecosystems—“semantic priors” and “domain generalization”—and for the first time characterizes their cognitive attributes. We identify key underlying mechanisms, including cross-task synergy, multi-domain interference, and zero-shot generalization, thereby offering a foundational design blueprint for next-generation embodied semantic simulators.
📝 Abstract
The remarkable progress of reinforcement learning (RL) is intrinsically tied to the environments used to train and evaluate artificial agents. Moving beyond traditional qualitative reviews, this work presents a large-scale, data-driven empirical investigation into the evolution of RL environments. By programmatically processing a massive corpus of academic literature and rigorously distilling over 2,000 core publications, we propose a quantitative methodology to map the transition from isolated physical simulations to generalist, language-driven foundation agents. Implementing a novel, multi-dimensional taxonomy, we systematically analyze benchmarks against diverse application domains and requisite cognitive capabilities. Our automated semantic and statistical analysis reveals a profound, data-verified paradigm shift: the bifurcation of the field into a "Semantic Prior" ecosystem dominated by Large Language Models (LLMs) and a "Domain-Specific Generalization" ecosystem. Furthermore, we characterize the "cognitive fingerprints" of these distinct domains to uncover the underlying mechanisms of cross-task synergy, multi-domain interference, and zero-shot generalization. Ultimately, this study offers a rigorous, quantitative roadmap for designing the next generation of Embodied Semantic Simulators, bridging the gap between continuous physical control and high-level logical reasoning.