π€ AI Summary
This work addresses the challenge that existing recommender systems struggle to accurately model usersβ genuine negative preferences, often suffering from contextual bias due to the sparsity of negative feedback and its overshadowing by positive signals. To this end, the paper introduces large language models (LLMs) into negative recommendation for the first time, proposing semantic ID representations and a context discrimination module to enhance understanding of negative preferences. It further designs an item-level alignment task and a progressive GRPO training strategy. Notably, to overcome the misalignment between conventional negative sampling objectives and true negative preferences, the authors innovatively construct a reward function and evaluation metrics based on multi-day future negative feedback. Experiments demonstrate that the proposed approach effectively mitigates contextual bias, significantly improves modeling of user negative preferences, and validates the efficacy of the new metrics and training mechanism.
π Abstract
Understanding what users like is relatively straightforward; understanding what users dislike, however, remains a challenging and underexplored problem. Research into users'negative preferences has gained increasing importance in modern recommendation systems. Numerous platforms have introduced explicit negative feedback mechanisms and leverage such signals to refine their recommendation models. Beyond traditional business metrics, user experience-driven metrics, such as negative feedback rates, have become critical indicators for evaluating system performance. However, most existing approaches primarily use negative feedback as an auxiliary signal to enhance positive recommendations, paying little attention to directly modeling negative interests, which can be highly valuable in offline applications. Moreover, due to the inherent sparsity of negative feedback data, models often suffer from context understanding biases induced by positive feedback dominance. To address these challenges, we propose the first large language model framework for negative feedback modeling with special designed context-discerning modules. We use semantic ID Representation to replace text-based item descriptions and introduce an item-level alignment task that enhances the LLM's understanding of the semantic context behind negative feedback. Furthermore, we design a Progressive GRPO training paradigm that enables the model to dynamically balance the positive and negative behavioral context utilization. Besides, our investigation further reveals a fundamental misalignment between the conventional next-negative-item prediction objective and users'true negative preferences, which is heavily influenced by the system's recommendation order. To mitigate this, we propose a novel reward function and evaluation metric grounded in multi-day future negative feedback and their collaborative signals.