🤖 AI Summary
Existing anti-money laundering (AML) graph datasets often lack realistic semantics and rely on template-based injection of anomalies, leading to overly optimistic model evaluations. To address this limitation, this work proposes TransXion, a high-fidelity benchmark that introduces entity profiles enriched with demographic and behavioral attributes. By integrating conditional behavior modeling with non-template-based stochastic subgraph synthesis, TransXion generates a large-scale payment network comprising 50,000 attributed entities and 3 million transactions. The benchmark effectively simulates "behavioral anomalies"—transactions inconsistent with an entity’s socioeconomic context—while preserving key structural properties of real-world networks. Empirical evaluation demonstrates that TransXion presents a significantly more challenging testbed for state-of-the-art AML detection models, offering a more reliable and realistic platform for method assessment.
📝 Abstract
Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-level semantics beyond anonymized identifiers, and (ii) they rely on template-driven anomaly injection, which biases benchmarks toward static structural motifs and yields overly optimistic assessments of model robustness. We propose TransXion, a benchmark ecosystem for Anti-Money Laundering (AML) research that integrates profile-aware simulation of normal activity with stochastic, non-template synthesis of illicit subgraphs.TransXion jointly models persistent entity profiles and conditional transaction behavior, enabling evaluation of "out-of-character" anomalies where observed activity contradicts an entity's socio-economic context. The resulting dataset comprises approximately 3 million transactions among 50,000 entities, each endowed with rich demographic and behavioral attributes. Empirical analyses show that TransXion reproduces key structural properties of payment networks, including heavy-tailed activity distributions and localized subgraph structure. Across a diverse array of detection models spanning multiple algorithmic paradigms, TransXion yields substantially lower detection performance than widely used benchmarks, demonstrating increased difficulty and realism. TransXion provides a more faithful testbed for developing context-aware and robust AML detection methods. The dataset and code are publicly available at https://github.com/chaos-max/TransXion.