🤖 AI Summary
To address poor cold-start (CS) item recommendation performance, insufficient feature utilization, sparse tagging, and low prediction scores on visual discovery platforms like Pinterest, this paper proposes a lightweight, four-dimensional collaborative optimization framework designed for industrial deployment. Under a strict constraint of ≤5% parameter increase, the framework integrates residual feature enhancement, prediction score regularization and calibration, Manifold Mixup-based manifold-aware data augmentation, and an efficient model architecture. It incurs no additional computational overhead while effectively mitigating representation degradation and distribution shift in CS scenarios. Online A/B experiments demonstrate a 10% improvement in engagement rate for fresh content; the system has been stably deployed to serve over 570 million users. This work establishes a scalable, high-impact collaborative optimization paradigm for cold-start challenges in large-scale visual recommendation systems.
📝 Abstract
Pinterest is a leading visual discovery platform where recommender systems (RecSys) are key to delivering relevant, engaging, and fresh content to our users. In this paper, we study the problem of improving RecSys model predictions for cold-start (CS) items, which appear infrequently in the training data. Although this problem is well-studied in academia, few studies have addressed its root causes effectively at the scale of a platform like Pinterest. By investigating live traffic data, we identified several challenges of the CS problem and developed a corresponding solution for each: First, industrial-scale RecSys models must operate under tight computational constraints. Since CS items are a minority, any related improvements must be highly cost-efficient. To address this, our solutions were designed to be lightweight, collectively increasing the total parameters by only 5%. Second, CS items are represented only by non-historical (e.g., content or attribute) features, which models often treat as less important. To elevate their significance, we introduce a residual connection for the non-historical features. Third, CS items tend to receive lower prediction scores compared to non-CS items, reducing their likelihood of being surfaced. We mitigate this by incorporating a score regularization term into the model. Fourth, the labels associated with CS items are sparse, making it difficult for the model to learn from them. We apply the manifold mixup technique to address this data sparsity. Implemented together, our methods increased fresh content engagement at Pinterest by 10% without negatively impacting overall engagement and cost, and have been deployed to serve over 570 million users on Pinterest.