🤖 AI Summary
Softmax loss suffers from high computational cost and poor scalability in large-scale similarity ranking tasks. To address this, we propose an NDCG-consistent approximate loss function and introduce two Ranking-Generalizable losses—RG² and RG×—unifying sampling- and non-sampling-based paradigms for the first time, while uncovering the intrinsic mechanism of weighted squared loss. Our loss is derived from the second-order Taylor expansion of Softmax and optimized via alternating least squares (ALS), with theoretical guarantees on convergence and generalization error. The formulation supports efficient distributed training. Experiments on real-world datasets demonstrate that our approach matches or surpasses Softmax in ranking metrics (e.g., NDCG), achieves several-fold faster convergence, and significantly improves training efficiency.
📝 Abstract
Ranking tasks constitute fundamental components of extreme similarity learning frameworks, where extremely large corpora of objects are modeled through relative similarity relationships adhering to predefined ordinal structures. Among various ranking surrogates, Softmax (SM) Loss has been widely adopted due to its natural capability to handle listwise ranking via global negative comparisons, along with its flexibility across diverse application scenarios. However, despite its effectiveness, SM Loss often suffers from significant computational overhead and scalability limitations when applied to large-scale object spaces. To address this challenge, we propose novel loss formulations that align directly with ranking metrics: the Ranking-Generalizable extbf{squared} (RG$^2$) Loss and the Ranking-Generalizable interactive (RG$^ imes$) Loss, both derived through Taylor expansions of the SM Loss. Notably, RG$^2$ reveals the intrinsic mechanisms underlying weighted squared losses (WSL) in ranking methods and uncovers fundamental connections between sampling-based and non-sampling-based loss paradigms. Furthermore, we integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method, providing both generalization guarantees and convergence rate analyses. Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance relative to SM Loss, while significantly accelerating convergence. This framework offers the similarity learning community both theoretical insights and practically efficient tools, with methodologies applicable to a broad range of tasks where balancing ranking quality and computational efficiency is essential.