LOBERT: Generative AI Foundation Model for Limit Order Book Messages

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This paper addresses three core challenges in modeling financial limit order book (LOB) message streams: irregular event timing, rapid regime shifts, and high-frequency traders’ dynamic responses to visible order flow. To this end, we propose the first general-purpose foundational encoder specifically designed for message-level LOB modeling. Innovatively adapting the BERT architecture to LOB data, we introduce a multidimensional message tokenization scheme that encodes each message—comprising price, size, and timestamp—into a single token, while preserving temporal and numerical properties via continuous embeddings. The model is built upon a Transformer encoder and operates natively on asynchronous event sequences without discretization or fixed-length windows. Evaluated on mid-price movement prediction and next-message classification, it achieves state-of-the-art performance while reducing required context length by over 50% compared to prior approaches, significantly improving computational efficiency and out-of-distribution generalization.

Technology Category

Application Category

📝 Abstract

Modeling the dynamics of financial Limit Order Books (LOB) at the message level is challenging due to irregular event timing, rapid regime shifts, and the reactions of high-frequency traders to visible order flow. Previous LOB models require cumbersome data representations and lack adaptability outside their original tasks, leading us to introduce LOBERT, a general-purpose encoder-only foundation model for LOB data suitable for downstream fine-tuning. LOBERT adapts the original BERT architecture for LOB data by using a novel tokenization scheme that treats complete multi-dimensional messages as single tokens while retaining continuous representations of price, volume, and time. With these methods, LOBERT achieves leading performance in tasks such as predicting mid-price movements and next messages, while reducing the required context length compared to previous methods.

Problem

Research questions and friction points this paper is trying to address.

Modeling Limit Order Book dynamics with irregular event timing

Overcoming cumbersome data representations in previous LOB models

Creating adaptable foundation model for financial message data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts BERT architecture for LOB data

Uses novel tokenization of multi-dimensional messages

Reduces context length while improving performance

🔎 Similar Papers

SimLOB: Learning Representations of Limited Order Book for Financial Market Simulation