🤖 AI Summary
To address the limitation of single-sequence event modeling—which neglects cross-user contextual dependencies and leads to prediction bias for cold-start users in dynamic domains such as finance—this paper proposes a multi-user representation aggregation framework. Our core method introduces, for the first time, a learnable kernel-based attention mechanism that explicitly models complex inter-user information flow while remaining compatible with efficient fine-tuning of existing event encoders. By integrating differentiable pooling and lightweight fine-tuning strategies, the framework significantly improves predictive performance on financial transaction datasets. Specifically, the kernel attention mechanism yields consistent gains in ROC AUC, while mean pooling provides robust performance improvements across settings. These results empirically validate that aggregating cross-user contextual information is critical for enhancing the quality of user behavioral representations.
📝 Abstract
Representation learning produces models in different domains, such as store purchases, client transactions, and general people's behaviour. However, such models for sequential data usually process a single sequence, ignoring context from other relevant ones, even in domains with rapidly changing external environments like finance or misguiding the prediction for a user with no recent events. We are the first to propose a method that aggregates information from multiple user representations augmenting a specific user one for a scenario of multiple co-occurring event sequences. Our study considers diverse aggregation approaches, ranging from simple pooling techniques to trainable attention-based approaches, especially Kernel attention aggregation, that can highlight more complex information flow from other users. The proposed method operates atop an existing encoder and supports its efficient fine-tuning. Across considered datasets of financial transactions and downstream tasks, Kernel attention improves ROC AUC scores, both with and without fine-tuning, while mean pooling yields a smaller but still significant gain.