🤖 AI Summary
Modeling massive transaction records in high-throughput payment networks poses significant challenges due to data scale and heterogeneity.
Method: This paper proposes a general-purpose, multi-task Transformer model. It introduces a dedicated input module that separates static and dynamic attributes, constructs a joint representation framework integrating heterogeneous signals—including response codes and system flags—and adopts an efficient high-cardinality categorical prediction training paradigm.
Contribution/Results: The model is the first to jointly optimize anomaly detection and user embedding generation within a unified architecture. On real production data, it achieves a 111% improvement in anomaly detection F1-score over prior methods. Moreover, the learned user embeddings boost the AUC of downstream recommendation models by 104%, substantially outperforming state-of-the-art industrial baselines.
📝 Abstract
Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.