Building surrogate models using trajectories of agents trained by Reinforcement Learning

๐Ÿ“… 2025-09-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Low sample efficiency in constructing surrogate models for high-dimensional deterministic simulations hinders practical deployment. Method: This paper proposes a reinforcement learningโ€“guided hybrid active sampling framework that integrates stochastic exploration, expert trajectory replay, and maximum-entropy policy optimization to achieve comprehensive and efficient coverage of the state space; it further couples Kriging-based surrogate modeling with an active learning mechanism to dynamically select informative sampling points. Contribution/Results: Evaluated on multiple simulation benchmarks, the method significantly improves surrogate model accuracy and generalization capability. It achieves 30โ€“50% higher sample efficiency compared to conventional approaches such as Latin hypercube sampling. By enabling more effective data utilization in computationally expensive simulation environments, the proposed framework establishes a novel paradigm for surrogate-assisted reinforcement learning in real-world simulation-intensive applications.

Technology Category

Application Category

๐Ÿ“ Abstract
Sample efficiency in the face of computationally expensive simulations is a common concern in surrogate modeling. Current strategies to minimize the number of samples needed are not as effective in simulated environments with wide state spaces. As a response to this challenge, we propose a novel method to efficiently sample simulated deterministic environments by using policies trained by Reinforcement Learning. We provide an extensive analysis of these surrogate-building strategies with respect to Latin-Hypercube sampling or Active Learning and Kriging, cross-validating performances with all sampled datasets. The analysis shows that a mixed dataset that includes samples acquired by random agents, expert agents, and agents trained to explore the regions of maximum entropy of the state transition distribution provides the best scores through all datasets, which is crucial for a meaningful state space representation. We conclude that the proposed method improves the state-of-the-art and clears the path to enable the application of surrogate-aided Reinforcement Learning policy optimization strategies on complex simulators.
Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in computationally expensive simulation environments
Addressing ineffective sampling strategies in wide state space simulations
Enabling surrogate-aided Reinforcement Learning policy optimization on complex simulators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using RL-trained agent trajectories for surrogate modeling
Mixed dataset with random, expert, and maximum entropy agents
Improving sample efficiency in deterministic simulated environments
๐Ÿ”Ž Similar Papers
No similar papers found.