LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Financial generative models lack a unified, quantitative evaluation paradigm—particularly for limit-order-book (LOB) message-level generation. This paper introduces LOB-Bench, the first dedicated benchmark for evaluating generative AI in LOB modeling. It enables multidimensional quantitative assessment across distributional statistics, market microstructure properties, and event-driven market impact. We innovatively define conditional and unconditional statistical consistency metrics under the LOSTER format and introduce, for the first time, market-impact measures—including price response functions and event cross-correlations. The framework integrates multivariate statistical tests, discriminator-based scoring, and event-driven modeling, implemented in Python. Empirical results demonstrate that autoregressive generative models significantly outperform traditional parametric models and (C)GANs in both statistical fidelity and market-dynamic realism, thereby establishing a standardized evaluation foundation for financial generative modeling.

Technology Category

Application Category

📝 Abstract
While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains"market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.
Problem

Research questions and friction points this paper is trying to address.

Evaluating generative AI for financial data
Benchmarking limit order book simulations
Assessing realism in LOB generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Python-based LOB benchmark
Multivariate statistical evaluation
Autoregressive GenAI approach
🔎 Similar Papers
No similar papers found.
P
Peer Nagy
Oxford-Man Institute of Quantitative Finance, University of Oxford
S
Sascha Frey
Department of Computer Science, University of Oxford
K
Kang Li
Department of Statistics, University of Oxford
Bidipta Sarkar
Bidipta Sarkar
University of Oxford
Multiagent InteractionHuman-AI InteractionVisionGraphics
Svitlana Vyetrenko
Svitlana Vyetrenko
J. P. Morgan AI Research
AI/ML
Stefan Zohren
Stefan Zohren
University of Oxford
Machine LearningFinanceTime SeriesQuantum TechnologiesMathematical Physics
A
Ani Calinescu
Department of Computer Science, University of Oxford
Jakob Foerster
Jakob Foerster
Associate Professor, University of Oxford
Artificial Intelligence