HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer

📅 2024-08-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This paper addresses the challenge of modeling high-frequency stock volatility in quantitative trading by formally casting interpretable risk factor discovery as an end-to-end language modeling task—its first such formulation. We propose a Transformer-based symbolic-numerical hybrid modeling framework: a unified vocabulary jointly generates both symbolic expression structures and real-valued constants, eliminating the need for predefined operator templates; generated constants are further refined via BFGS optimization to enhance numerical robustness. Evaluated on high-frequency data from CSI 300 and S&P 500, our method achieves a 30% average improvement in out-of-sample returns over the ten baselines in SRBench, while accelerating inference by two to three orders of magnitude relative to existing high-frequency risk factor methods. The core contribution lies in reframing formula discovery as a trainable language generation problem and enabling joint optimization of symbolic structure and numerical parameters.

Technology Category

Application Category

📝 Abstract

In quantitative trading, transforming historical stock data into interpretable, formulaic risk factors enhances the identification of market volatility and risk. Despite recent advancements in neural networks for extracting latent risk factors, these models remain limited to feature extraction and lack explicit, formulaic risk factor designs. By viewing symbolic mathematics as a language where valid mathematical expressions serve as meaningful"sentences"we propose framing the task of mining formulaic risk factors as a language modeling problem. In this paper, we introduce an end to end methodology, Intraday Risk Factor Transformer (IRFT), to directly generate complete formulaic risk factors, including constants. We use a hybrid symbolic numeric vocabulary where symbolic tokens represent operators and stock features, and numeric tokens represent constants. We train a Transformer model on high frequency trading (HFT) datasets to generate risk factors without relying on a predefined skeleton of operators. It determines the general form of the stock volatility law, including constants. We refine the predicted constants using the Broyden Fletcher Goldfarb Shanno (BFGS) algorithm to mitigate non linear issues. Compared to the ten approaches in SRBench, an active benchmark for symbolic regression (SR), IRFT achieves a 30% higher investment return on the HS300 and SP500 datasets, while achieving inference times that are orders of magnitude faster than existing methods in HF risk factor mining tasks.

Problem

Research questions and friction points this paper is trying to address.

Quantitative Trading

Stock Price Volatility

Mathematical Formula

Innovation

Methods, ideas, or system contributions that make the work stand out.

IRFT

Transformer-BFGS Integration

Symbolic Regression in Quantitative Trading

🔎 Similar Papers

PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities