π€ AI Summary
This work addresses the limitations of conventional Softmax loss in implicit feedback recommendation, where the use of a global temperature and uniform negative sampling fails to account for varying competition intensities among samples, leading to unstable training. To overcome this, we propose Dual-Scale Softmax Loss (DSL), which introduces, for the first time, competition-aware negative reweighting and instance-level temperature adaptation within the Softmax framework. DSL dynamically reweights negatives by integrating their difficulty and item similarity, while estimating a per-sample temperature based on its local competitive setβthereby reshaping the negative distribution without compromising the geometric structure of the loss. Theoretically grounded in distributionally robust optimization, DSL achieves an average performance gain of 6.22% across multiple benchmarks and backbone models, with improvements reaching 9.31% under out-of-distribution popularity shifts, significantly outperforming existing methods.
π Abstract
Softmax Loss (SL) is being increasingly adopted for recommender systems (RS) as it has demonstrated better performance, robustness and fairness. Yet in implicit-feedback, a single global temperature and equal treatment of uniformly sampled negatives can lead to brittle training, because sampled sets may contain varying degrees of relevant or informative competitors. The optimal loss sharpness for a user-item pair with a particular set of negatives, can be suboptimal or destabilising for another with different negatives. We introduce Dual-scale Softmax Loss (DSL), which infers effective sharpness from the sampled competition itself. DSL adds two complementary branches to the log-sum-exp backbone. Firstly it reweights negatives within each training instance using hardness and item--item similarity, secondly it adapts a per-example temperature from the competition intensity over a constructed competitor slate. Together, these components preserve the geometry of SL while reshaping the competition distribution across negatives and across examples. Over several representative benchmarks and backbones, DSL yields substantial gains over strong baselines, with improvements over SL exceeding $10%$ in several settings and averaging $6.22%$ across datasets, metrics, and backbones. Under out-of-distribution (OOD) popularity shift, the gains are larger, with an average of $9.31%$ improvement over SL. We further provide a theoretical, distributionally robust optimisation (DRO) analysis, which demonstrates how DSL reshapes the robust payoff and the KL deviation for ambiguous instances. This helps explain the empirically observed improvements in accuracy and robustness.