On the Role of Weight Decay in Collaborative Filtering: A Popularity Perspective

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work reveals that weight decay in collaborative filtering (CF) serves not merely as a regularizer but implicitly encodes item popularity into embedding norm magnitudes. To address this, we propose PRISM: a Popularity-Regulated Initialization Strategy for Matrix factorization that explicitly injects popularity priors—derived from item exposure frequencies—into embedding initialization, thereby eliminating the need for weight decay entirely. PRISM requires no modifications to the loss function, optimizer, or model architecture, ensuring interpretability, controllability, and flexible adjustment to mitigate popularity bias. Theoretical analysis establishes its soundness, while extensive experiments across multiple benchmarks demonstrate up to 4.77% improvement in recommendation accuracy, 38.48% faster training convergence, and significantly reduced hyperparameter tuning cost. PRISM introduces a novel paradigm for bias modeling in CF, shifting emphasis from implicit regularization to explicit, interpretable prior injection.

Technology Category

Application Category

📝 Abstract

Collaborative filtering (CF) enables large-scale recommendation systems by encoding information from historical user-item interactions into dense ID-embedding tables. However, as embedding tables grow, closed-form solutions become impractical, often necessitating the use of mini-batch gradient descent for training. Despite extensive work on designing loss functions to train CF models, we argue that one core component of these pipelines is heavily overlooked: weight decay. Attaining high-performing models typically requires careful tuning of weight decay, regardless of loss, yet its necessity is not well understood. In this work, we question why weight decay is crucial in CF pipelines and how it impacts training. Through theoretical and empirical analysis, we surprisingly uncover that weight decay's primary function is to encode popularity information into the magnitudes of the embedding vectors. Moreover, we find that tuning weight decay acts as a coarse, non-linear knob to influence preference towards popular or unpopular items. Based on these findings, we propose PRISM (Popularity-awaRe Initialization Strategy for embedding Magnitudes), a straightforward yet effective solution to simplify the training of high-performing CF models. PRISM pre-encodes the popularity information typically learned through weight decay, eliminating its necessity. Our experiments show that PRISM improves performance by up to 4.77% and reduces training times by 38.48%, compared to state-of-the-art training strategies. Additionally, we parameterize PRISM to modulate the initialization strength, offering a cost-effective and meaningful strategy to mitigate popularity bias.

Problem

Research questions and friction points this paper is trying to address.

Understanding weight decay's role in collaborative filtering training

Exploring how weight decay encodes popularity in embedding vectors

Proposing PRISM to simplify training by pre-encoding popularity information

Innovation

Methods, ideas, or system contributions that make the work stand out.

PRISM pre-encodes popularity into embedding magnitudes

Weight decay encodes popularity via embedding magnitudes

PRISM reduces training time by 38.48%

🔎 Similar Papers

Beyond Check-in Counts: Redefining Popularity for POI Recommendation with Users and Recency