🤖 AI Summary
This paper addresses the optimal execution of large orders, aiming to minimize both implementation shortfall and market impact. We propose a data-driven, nonparametric reinforcement learning framework: first, we construct a high-fidelity limit-order-book market simulator based on a queue-reactive model that captures transient price impact and dynamic order-flow responses; second, we integrate this model with a model-free RL algorithm—specifically, Double DQN—using state features including time, inventory position, asset price, and order-book depth. Unlike conventional parametric approaches, our framework imposes no assumptions on market dynamics, enabling counterfactual policy evaluation and adaptive strategy generation. Empirical results demonstrate that the learned policy exhibits both strategic temporal planning and tactical micro-adjustments, consistently outperforming benchmark methods across diverse market scenarios.
📝 Abstract
We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.