Encode Me If You Can: Learning Universal User Representations via Event Sequence Autoencoding

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of repetitive feature engineering and model training in multi-task user modeling, this paper proposes a generic user behavior representation framework. It uniformly encodes heterogeneous user interactions (e.g., clicks, purchases) into temporal event sequences and employs a GRU-based autoencoder to learn fixed-dimensional dense vectors, enabling effective sequence compression and reconstruction. Furthermore, it integrates multi-granularity embeddings—including positional, interaction-type, and temporal embeddings—to construct a robust, unified user representation. Crucially, the framework requires no task-specific adaptation and can be directly transferred to downstream tasks such as churn prediction, recommendation, and lifetime value (LTV) estimation. Evaluated on the RecSys Challenge 2025, it achieved second place, demonstrating strong cross-task generalization, computational efficiency, and scalability.

Technology Category

Application Category

📝 Abstract
Building universal user representations that capture the essential aspects of user behavior is a crucial task for modern machine learning systems. In real-world applications, a user's historical interactions often serve as the foundation for solving a wide range of predictive tasks, such as churn prediction, recommendations, or lifetime value estimation. Using a task-independent user representation that is effective across all such tasks can reduce the need for task-specific feature engineering and model retraining, leading to more scalable and efficient machine learning pipelines. The goal of the RecSys Challenge 2025 by Synerise was to develop such Universal Behavioral Profiles from logs of past user behavior, which included various types of events such as product purchases, page views, and search queries. We propose a method that transforms the entire user interaction history into a single chronological sequence and trains a GRU-based autoencoder to reconstruct this sequence from a fixed-size vector. If the model can accurately reconstruct the sequence, the latent vector is expected to capture the key behavioral patterns. In addition to this core model, we explored several alternative methods for generating user embeddings and combined them by concatenating their output vectors into a unified representation. This ensemble strategy further improved generalization across diverse downstream tasks and helped our team, ai_lab_recsys, achieve second place in the RecSys Challenge 2025.
Problem

Research questions and friction points this paper is trying to address.

Develop universal user representations from behavior logs
Create task-independent embeddings for diverse predictive tasks
Reconstruct user event sequences using autoencoder models
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRU-based autoencoder for sequence reconstruction
Fixed-size vector captures key behavioral patterns
Ensemble strategy combines multiple embedding methods
🔎 Similar Papers
No similar papers found.
Anton Klenitskiy
Anton Klenitskiy
Sber AI Lab
Machine learningDeep learning
A
Artem Fatkulin
Sber AI Lab, Moscow, Russian Federation; HSE University, Moscow, Russian Federation
Daria Denisova
Daria Denisova
ML Researcher, Sber AI Lab
MLRecSysGenerative models
A
Anton Pembek
Sber AI Lab, Moscow, Russian Federation; Lomonosov Moscow State University (MSU), Moscow, Russian Federation
Alexey Vasilev
Alexey Vasilev
Sber AI Lab; HSE University; MSU
Machine LearningData scienceRecommender SystemNLP