🤖 AI Summary
Existing multi-source user representation learning faces three key challenges: (1) the absence of a unified representation framework, (2) non-scalable compression and storage of heterogeneous data, and (3) weak cross-task generalization. To address these, we propose a Unified User Quantized Representation (UQR) framework featuring a novel two-stage architecture: (1) a causal Q-Former that enables cross-domain knowledge transfer for causal feature alignment, and (2) a multi-view Residual Quantized Variational Autoencoder (RQ-VAE) that performs early heterogeneous fusion and discrete compression via shared and source-specific codebooks—yielding semantically consistent, storage-efficient user tokens within a unified causal representation space. The framework seamlessly integrates with large language models and outperforms task-specific baselines on behavioral prediction and recommendation tasks, while significantly reducing computational and storage overhead. Empirical evaluation confirms its industrial-scale scalability.
📝 Abstract
Multi-source user representation learning plays a critical role in enabling personalized services on web platforms (e.g., Alipay). While prior works have adopted late-fusion strategies to combine heterogeneous data sources, they suffer from three key limitations: lack of unified representation frameworks, scalability and storage issues in data compression, and inflexible cross-task generalization. To address these challenges, we propose U^2QT (Unified User Quantized Tokenizers), a novel framework that integrates cross-domain knowledge transfer with early fusion of heterogeneous domains. Our framework employs a two-stage architecture: first, a causal Q-Former projects domain-specific features into a shared causal representation space to preserve inter-modality dependencies; second, a multi-view RQ-VAE discretizes causal embeddings into compact tokens through shared and source-specific codebooks, enabling efficient storage while maintaining semantic coherence. Experimental results showcase U^2QT's advantages across diverse downstream tasks, outperforming task-specific baselines in future behavior prediction and recommendation tasks while achieving efficiency gains in storage and computation. The unified tokenization framework enables seamless integration with language models and supports industrial-scale applications.