🤖 AI Summary
This study addresses the common oversight in existing reinforcement learning trading environments, which often neglect or oversimplify transaction costs, leading to strategies that fail in real-world deployment. Building upon the Almgren-Chriss framework and the square-root market impact law, this work proposes three open-source, Gymnasium-compatible trading environments that, for the first time, systematically incorporate empirically validated nonlinear market impact models. These environments support modular cost structures, exponentially decaying permanent impact, and fine-grained logging. Integrated with FinRL-Meta extensions and Optuna-based hyperparameter optimization, five state-of-the-art deep reinforcement learning algorithms are evaluated on NASDAQ-100 data. Results demonstrate that adopting the proposed model reduces average daily trading costs from $200,000 to $8,000 and turnover from 19% to 1%; hyperparameter optimization further cuts costs by up to 82%, with algorithm performance shown to be highly sensitive to the fidelity of cost modeling.
📝 Abstract
Reinforcement learning (RL) has shown promise for trading, yet most open-source backtesting environments assume negligible or fixed transaction costs, causing agents to learn trading behaviors that fail under realistic execution. We introduce three Gymnasium-compatible trading environments -- MACE (Market-Adjusted Cost Execution) stock trading, margin trading, and portfolio optimization -- that integrate nonlinear market impact models grounded in the Almgren-Chriss framework and the empirically validated square-root impact law. Each environment provides pluggable cost models, permanent impact tracking with exponential decay, and comprehensive trade-level logging. We evaluate five DRL algorithms (A2C, PPO, DDPG, SAC, TD3) on the NASDAQ-100, comparing a fixed 10 bps baseline against the AC model with Optuna-tuned hyperparameters. Our results show that (i) the cost model materially changes both absolute performance and the relative ranking of algorithms across all three environments; (ii) the AC model produces dramatically different trading behavior, e.g., daily costs dropping from $200k to $8k with turnover falling from 19% to 1%; (iii) hyperparameter optimization is essential for constraining pathological trading, with costs dropping up to 82%; and (iv) algorithm-cost model interactions are strongly environment-specific, e.g., DDPG's OOS Sharpe jumps from -2.1 to 0.3 under AC in margin trading while SAC's drops from -0.5 to -1.2. We release the full suite as an open-source extension to FinRL-Meta.