🤖 AI Summary
This paper addresses order cancellation errors and inventory imbalance caused by exchange latency (30–100 ms) in automated market making. To bridge the gap between simulation and live trading, we design a batch-matching simulation environment with realistic 500-ms matching intervals. We propose Relaver, a reinforcement learning framework for latency-robust market making. Its key contributions are: (1) a novel augmented state-action space incorporating order holding time; (2) a dynamic programming–guided exploration mechanism to enhance policy stability; and (3) an integrated lightweight temporal trend predictor enabling risk-aware inventory control. Evaluated on four real-world trading datasets, Relaver consistently outperforms existing RL-based market makers across multiple metrics—including profit-and-loss, inventory volatility, and order fill rate—demonstrating effective co-optimization of latency robustness and risk controllability.
📝 Abstract
The latency of the exchanges in Market Making (MM) is inevitable due to hardware limitations, system processing times, delays in receiving data from exchanges, the time required for order transmission to reach the market, etc. Existing reinforcement learning (RL) methods for Market Making (MM) overlook the impact of these latency, which can lead to unintended order cancellations due to price discrepancies between decision and execution times and result in undesired inventory accumulation, exposing MM traders to increased market risk. Therefore, these methods cannot be applied in real MM scenarios. To address these issues, we first build a realistic MM environment with random delays of 30-100 milliseconds for order placement and market information reception, and implement a batch matching mechanism that collects orders within every 500 milliseconds before matching them all at once, simulating the batch auction mechanisms adopted by some exchanges. Then, we propose Relaver, an RL-based method for MM to tackle the latency and inventory risk issues. The three main contributions of Relaver are: i) we introduce an augmented state-action space that incorporates order hold time alongside price and volume, enabling Relaver to optimize execution strategies under latency constraints and time-priority matching mechanisms, ii) we leverage dynamic programming (DP) to guide the exploration of RL training for better policies, iii) we train a market trend predictor, which can guide the agent to intelligently adjust the inventory to reduce the risk. Extensive experiments and ablation studies on four real-world datasets demonstrate that extsc{Relaver} significantly improves the performance of state-of-the-art RL-based MM strategies across multiple metrics.