🤖 AI Summary
This work addresses the limited discriminative power of existing large language model (LLM)-based recommender systems, which rely on offline-generated sequence-level negative samples and struggle in large-scale negative sampling spaces. To overcome this, the authors propose ILRec, a novel framework that dynamically generates self-hard negative tokens from intermediate LLM layers as fine-grained negative supervision signals. ILRec employs a two-stage training strategy that enhances model discrimination through cross-layer preference optimization and knowledge distillation. Additionally, a lightweight collaborative filtering module is integrated to mitigate false punishment issues arising from overly aggressive negative sampling. Extensive experiments on three benchmark datasets demonstrate significant improvements in recommendation performance, validating the effectiveness of the proposed approach in enriching the informativeness of negative samples and boosting LLM-based recommendation accuracy.
📝 Abstract
Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we propose ILRec, a novel preference fine-tuning framework for LLM-based recommendation, leveraging self-hard negative signals extracted from intermediate layers to improve preference learning. Specifically, we identify self-hard negative tokens from intermediate layers as fine-grained negative supervision that dynamically reflects the model's preference learning process. To effectively integrate these signals into training, we design a two-stage framework comprising cross-layer preference optimization and cross-layer preference distillation, enabling the model to jointly discriminate informative negatives and enhance the quality of negative signals from intermediate layers. In addition, we introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals, mitigating the risk of over-penalizing false negatives. Extensive experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.