π€ AI Summary
Confounding bias induced by latent variables severely undermines causal inference in recommender systems, while conventional instrumental variable (IV) methods suffer from labor-intensive manual construction and uncertain validity. This paper proposes CIV4Recβthe first fully automated, data-driven framework for learning conditional instrumental variables (CIVs), which jointly discovers effective CIVs and their conditioning sets directly from user behavioral data without domain expertise or prior knowledge. Our method integrates a variational autoencoder (VAE) with conditional independence testing to enable end-to-end optimization of both CIV discovery and causal effect estimation via two-stage least squares (2SLS). Extensive experiments on Movielens-10M and Douban-Movie demonstrate significant improvements: click-through rate prediction accuracy increases substantially, and NDCG@10 improves by an average of 12.7%, validating CIV4Recβs effectiveness and generalizability in mitigating confounding bias inherent in interactive recommendation data.
π Abstract
In recommender systems, latent variables can cause user-item interaction data to deviate from true user preferences. This biased data is then used to train recommendation models, further amplifying the bias and ultimately compromising both recommendation accuracy and user satisfaction. Instrumental Variable (IV) methods are effective tools for addressing the confounding bias introduced by latent variables; however, identifying a valid IV is often challenging. To overcome this issue, we propose a novel data-driven conditional IV (CIV) debiasing method for recommender systems, called CIV4Rec. CIV4Rec automatically generates valid CIVs and their corresponding conditioning sets directly from interaction data, significantly reducing the complexity of IV selection while effectively mitigating the confounding bias caused by latent variables in recommender systems. Specifically, CIV4Rec leverages a variational autoencoder (VAE) to generate the representations of the CIV and its conditional set from interaction data, followed by the application of least squares to derive causal representations for click prediction. Extensive experiments on two real-world datasets, Movielens-10M and Douban-Movie, demonstrate that our CIV4Rec successfully identifies valid CIVs, effectively reduces bias, and consequently improves recommendation accuracy.