LOB-Bench: Benchmarking Generative AI for Finance - an Application to Limit Order Book Data

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Financial generative models lack a unified, quantitative evaluation paradigm—particularly for limit-order-book (LOB) message-level generation. This paper introduces LOB-Bench, the first dedicated benchmark for evaluating generative AI in LOB modeling. It enables multidimensional quantitative assessment across distributional statistics, market microstructure properties, and event-driven market impact. We innovatively define conditional and unconditional statistical consistency metrics under the LOSTER format and introduce, for the first time, market-impact measures—including price response functions and event cross-correlations. The framework integrates multivariate statistical tests, discriminator-based scoring, and event-driven modeling, implemented in Python. Empirical results demonstrate that autoregressive generative models significantly outperform traditional parametric models and (C)GANs in both statistical fidelity and market-dynamic realism, thereby establishing a standardized evaluation foundation for financial generative modeling.

Technology Category

Application Category

📝 Abstract

While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative message-by-order data for limit order books (LOB) in the LOBSTER format. Our framework measures distributional differences in conditional and unconditional statistics between generated and real LOB data, supporting flexible multivariate statistical evaluation. The benchmark also includes features commonly used LOB statistics such as spread, order book volumes, order imbalance, and message inter-arrival times, along with scores from a trained discriminator network. Lastly, LOB-Bench contains"market impact metrics", i.e. the cross-correlations and price response functions for specific events in the data. We benchmark generative autoregressive state-space models, a (C)GAN, as well as a parametric LOB model and find that the autoregressive GenAI approach beats traditional model classes.

Problem

Research questions and friction points this paper is trying to address.

Evaluating generative AI for financial data

Benchmarking limit order book simulations

Assessing realism in LOB generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Python-based LOB benchmark

Multivariate statistical evaluation

Autoregressive GenAI approach

🔎 Similar Papers

No similar papers found.