Classifying and Clustering Trading Agents

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

In financial research, the absence of behavior-ground-truth labels in real-market data and the poor interpretability of black-box models hinder trader behavior modeling. To address this, we propose an “interpretable laboratory” framework based on surrogate models: generating high-fidelity synthetic trading data with labeled investor behavior types, and systematically comparing supervised classification versus unsupervised clustering for behavioral discrimination. Our methodology integrates behavioral sequence feature engineering, supervised learning (SVM, Random Forest), unsupervised clustering (k-means, DBSCAN), and SHAP-based interpretability analysis. Experiments show supervised classifiers achieve >95% accuracy, whereas unsupervised clustering incurs >40% error rates—revealing its fundamental limitations in behavioral classification. Key discriminative dimensions—including order-flow persistence and response latency—are identified. This work pioneers the use of interpretable surrogate models as a benchmark tool for financial behavioral research, establishing a novel paradigm for behavioral finance modeling.

Technology Category

Application Category

📝 Abstract

The rapid development of sophisticated machine learning methods, together with the increased availability of financial data, has the potential to transform financial research, but also poses a challenge in terms of validation and interpretation. A good case study is the task of classifying financial investors based on their behavioral patterns. Not only do we have access to both classification and clustering tools for high-dimensional data, but also data identifying individual investors is finally available. The problem, however, is that we do not have access to ground truth when working with real-world data. This, together with often limited interpretability of modern machine learning methods, makes it difficult to fully utilize the available research potential. In order to deal with this challenge we propose to use a realistic agent-based model as a way to generate synthetic data. This way one has access to ground truth, large replicable data, and limitless research scenarios. Using this approach we show how, even when classifying trading agents in a supervised manner is relatively easy, a more realistic task of unsupervised clustering may give incorrect or even misleading results. We complete the results with investigating the details of how supervised techniques were able to successfully distinguish between different trading behaviors.

Problem

Research questions and friction points this paper is trying to address.

Classifying trading agents without ground truth data

Validating unsupervised clustering of investor behaviors

Interpreting machine learning results in financial research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-based model generates synthetic financial data

Combines supervised and unsupervised machine learning

Validates trading agent classification with ground truth

🔎 Similar Papers

Detecting Financial Bots on the Ethereum Blockchain