Encode Me If You Can: Learning Universal User Representations via Event Sequence Autoencoding

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the inefficiency of repetitive feature engineering and model training in multi-task user modeling, this paper proposes a generic user behavior representation framework. It uniformly encodes heterogeneous user interactions (e.g., clicks, purchases) into temporal event sequences and employs a GRU-based autoencoder to learn fixed-dimensional dense vectors, enabling effective sequence compression and reconstruction. Furthermore, it integrates multi-granularity embeddings—including positional, interaction-type, and temporal embeddings—to construct a robust, unified user representation. Crucially, the framework requires no task-specific adaptation and can be directly transferred to downstream tasks such as churn prediction, recommendation, and lifetime value (LTV) estimation. Evaluated on the RecSys Challenge 2025, it achieved second place, demonstrating strong cross-task generalization, computational efficiency, and scalability.

Technology Category

Application Category

📝 Abstract

Building universal user representations that capture the essential aspects of user behavior is a crucial task for modern machine learning systems. In real-world applications, a user's historical interactions often serve as the foundation for solving a wide range of predictive tasks, such as churn prediction, recommendations, or lifetime value estimation. Using a task-independent user representation that is effective across all such tasks can reduce the need for task-specific feature engineering and model retraining, leading to more scalable and efficient machine learning pipelines. The goal of the RecSys Challenge 2025 by Synerise was to develop such Universal Behavioral Profiles from logs of past user behavior, which included various types of events such as product purchases, page views, and search queries. We propose a method that transforms the entire user interaction history into a single chronological sequence and trains a GRU-based autoencoder to reconstruct this sequence from a fixed-size vector. If the model can accurately reconstruct the sequence, the latent vector is expected to capture the key behavioral patterns. In addition to this core model, we explored several alternative methods for generating user embeddings and combined them by concatenating their output vectors into a unified representation. This ensemble strategy further improved generalization across diverse downstream tasks and helped our team, ai_lab_recsys, achieve second place in the RecSys Challenge 2025.

Problem

Research questions and friction points this paper is trying to address.

Develop universal user representations from behavior logs

Create task-independent embeddings for diverse predictive tasks

Reconstruct user event sequences using autoencoder models

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRU-based autoencoder for sequence reconstruction

Fixed-size vector captures key behavioral patterns

Ensemble strategy combines multiple embedding methods

🔎 Similar Papers

SoMeR: Multi-View User Representation Learning for Social Media