🤖 AI Summary
To address sparsity in user-video interactions, insufficient exploitation of multimodal signals, and oversmoothing in hypergraph modeling for micro-video cold-start recommendation, this paper proposes a multi-view hypergraph collaborative self-supervised framework. Methodologically, it constructs multi-granularity hyperedges by fusing textual descriptions, cover images, and dynamic video features to capture high-order user-video interactions; designs a multi-view feature encoder and hypergraph neural network (HGNN), incorporating node-, view-, and graph-level contrastive losses alongside cross-modal alignment self-supervision to explicitly enforce semantic consistency among implicit interactions. Innovatively, it is the first work to deeply integrate multi-view hypergraph modeling with cross-modal contrastive learning, effectively mitigating oversmoothing. Experiments on two real-world datasets demonstrate up to 23.6% improvement in cold-start user recommendation accuracy over state-of-the-art methods. The code is publicly available.
📝 Abstract
With the widespread use of mobile devices and the rapid growth of micro-video platforms such as TikTok and Kwai, the demand for personalized micro-video recommendation systems has significantly increased. Micro-videos typically contain diverse information, such as textual metadata, visual cues (e.g., cover images), and dynamic video content, significantly affecting user interaction and engagement patterns. However, most existing approaches often suffer from the problem of over-smoothing, which limits their ability to capture comprehensive interaction information effectively. Additionally, cold-start scenarios present ongoing challenges due to sparse interaction data and the underutilization of available interaction signals. To address these issues, we propose a Multi-view Hypergraph-based Contrastive learning model for cold-start micro-video Recommendation (MHCR). MHCR introduces a multi-view multimodal feature extraction layer to capture interaction signals from various perspectives and incorporates multi-view self-supervised learning tasks to provide additional supervisory signals. Through extensive experiments on two real-world datasets, we show that MHCR significantly outperforms existing video recommendation models and effectively mitigates cold-start challenges. Our code is available at https://github.com/sisuolv/MHCR.