Causal Representation Learning for Generalisable Recommendation

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge of distribution shift in recommender systems, where offline training data is often confounded by deployment policies, user behavior biases, and platform-side filtering, leading to degraded generalization in online serving. To mitigate this, the authors propose an information-theoretic, causally motivated representation disentanglement method that enhances out-of-distribution generalization without requiring full causal identification. Leveraging only observational logged data, the approach optimizes a variational lower bound directly from finite samples and seamlessly integrates with any existing recommendation architecture, imposing no additional inference overhead. Empirical evaluations demonstrate significant improvements in user engagement on Spotify’s large-scale A/B tests involving tens of millions of users, and consistent superiority over state-of-the-art baselines under distribution shifts on both the KuaiRand dataset and synthetic benchmarks.

📝 Abstract

Predictive models trained on observational data often fail to generalise to the distributions they encounter when deployed, especially when the training data is a product of the system being optimised. Recommender systems are a canonical example: they are trained on interaction logs confounded by the deployed policy, past user behaviour, and platform filtering. As a result, the training distribution differs substantially from the candidate distribution scored at serving time, a gap that makes offline metrics unreliable predictors of online performance. We address the distribution shift problem with a method motivated by causal representation learning (CRL). We propose an information-theoretic disentanglement criterion and prove that its optimum depends only on the causal components of the input. We then derive a tractable variational lower bound that makes the criterion optimisable from finite observational data alone. The scope of our method is narrower than that of much of the CRL literature, in that we target better generalisation under distribution shift, not full identification of all latent causal factors. This narrower target is what makes the method practical, requiring only the existing confounded logs, applying to any standard supervised model, and adding no inference-time cost. Our headline evaluation is an A/B test with millions of users on Spotify, applied to a production ranker for personalised playlist generation. A capacity-matched CRL variant performed on par offline but delivered substantial online gains in listener engagement. Complementary evidence on the public KuaiRand recommendation dataset and a synthetic benchmark with known causal structure shows the same pattern: offline parity with baseline, gains under distribution shift. Across all three settings, adding our causal disentanglement objective yields meaningfully better out-of-distribution generalisation.

Problem

Research questions and friction points this paper is trying to address.

distribution shift

recommendation systems

causal representation learning

out-of-distribution generalization

confounded data

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal representation learning

distribution shift

disentanglement