🤖 AI Summary
The proliferation of financial bots on Ethereum has exacerbated market manipulation and systemic risks, necessitating robust bot identification mechanisms. Method: We propose the first fine-grained financial bot taxonomy—comprising seven primary categories and 24 subcategories—and construct a labeled dataset of 133 human and 137 bot addresses. Our detection framework uniquely integrates unsupervised Gaussian Mixture Model (GMM) clustering with supervised Random Forest classification, leveraging domain-informed on-chain transaction behavioral features and literature-driven label construction. Results: Experimental evaluation achieves 82.6% clustering purity with GMM and 83% accuracy in binary bot–human classification—substantially outperforming conventional rule-based systems. This work establishes an interpretable, scalable, and high-accuracy paradigm for on-chain bot detection, thereby supporting fair and secure governance of decentralized finance ecosystems.
📝 Abstract
The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.