🤖 AI Summary
This paper addresses semi-supervised matrix completion for recommender systems, jointly leveraging abundant biased implicit feedback (unlabeled) and scarce noisy explicit ratings (labeled) to simultaneously estimate the underlying low-rank true rating matrix $R$ and the low-rank sampling probability matrix $P$. Methodologically, it introduces a “shared low-rank subspace” assumption that couples the structures of $R$ and $P$, explicitly modeling sampling bias in user behavior. Theoretically, it derives a generalization error bound governed jointly by the quantities of both implicit and explicit data—first characterizing their synergistic generalization mechanism. Algorithmically, it proposes a subspace-recovery-based joint optimization framework enabling distribution-aware error analysis. Experiments on real-world datasets—including Douban and MovieLens—demonstrate that the method significantly outperforms baselines relying solely on explicit feedback.
📝 Abstract
We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and extit{share a common subspace}. We assume that a large amount $M$ of extit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $widetilde{O}left(sqrt{frac{nd}{M}}
ight)$ and $widetilde{O}left(sqrt{frac{dr}{N}}
ight)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.