🤖 AI Summary
This work addresses the limitations of existing negative sampling strategies in implicit collaborative filtering, which often rely on textual information or task-specific fine-tuning and struggle to effectively identify false negatives or mine high-quality hard negatives. To overcome these challenges, we propose DTL-NS, a novel text-free and fine-tuning-free negative sampling method that leverages a dual-tree architecture enhanced by large language models—marking the first attempt to utilize such models for negative sampling without any textual input. By encoding collaborative structures and latent semantics into structured IDs through hierarchical index trees, DTL-NS integrates user preferences with item-level hierarchical similarity to enable offline false-negative identification and multi-perspective hard-negative sampling. Experiments on the Amazon-Sports dataset demonstrate significant improvements, with Recall@20 and NDCG@20 increasing by 10.64% and 19.12%, respectively, and the approach consistently enhances the performance of various implicit CF models.
📝 Abstract
Negative sampling is a pivotal technique in implicit collaborative filtering (CF) recommendation, enabling efficient and effective training by contrasting observed interactions with sampled unobserved ones. Recently, large language models (LLMs) have shown promise in recommender systems; however, research on LLM-empowered negative sampling remains underexplored. Existing methods heavily rely on textual information and task-specific fine-tuning, limiting practical applicability. To address this limitation, we propose a text-free and fine-tuning-free Dual-Tree LLM-enhanced Negative Sampling method (DTL-NS). It consists of two modules: (i) an offline false negative identification module that leverages hierarchical index trees to transform collaborative structural and latent semantic information into structured item-ID encodings for LLM inference, enabling accurate identification of false negatives; and (ii) a multi-view hard negative sampling module that combines user-item preference scores with item-item hierarchical similarities from these encodings to mine high-quality hard negatives, thus improving models'discriminative ability. Extensive experiments demonstrate the effectiveness of DTL-NS. For example, on the Amazon-sports dataset, DTL-NS outperforms the strongest baseline by 10.64% and 19.12% in Recall@20 and NDCG@20, respectively. Moreover, DTL-NS can be integrated into various implicit CF models and negative sampling methods, consistently enhancing their performance.