NDCG-Consistent Softmax Approximation with Accelerated Convergence

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Softmax loss suffers from high computational cost and poor scalability in large-scale similarity ranking tasks. To address this, we propose an NDCG-consistent approximate loss function and introduce two Ranking-Generalizable losses—RG² and RG×—unifying sampling- and non-sampling-based paradigms for the first time, while uncovering the intrinsic mechanism of weighted squared loss. Our loss is derived from the second-order Taylor expansion of Softmax and optimized via alternating least squares (ALS), with theoretical guarantees on convergence and generalization error. The formulation supports efficient distributed training. Experiments on real-world datasets demonstrate that our approach matches or surpasses Softmax in ranking metrics (e.g., NDCG), achieves several-fold faster convergence, and significantly improves training efficiency.

Technology Category

Application Category

📝 Abstract
Ranking tasks constitute fundamental components of extreme similarity learning frameworks, where extremely large corpora of objects are modeled through relative similarity relationships adhering to predefined ordinal structures. Among various ranking surrogates, Softmax (SM) Loss has been widely adopted due to its natural capability to handle listwise ranking via global negative comparisons, along with its flexibility across diverse application scenarios. However, despite its effectiveness, SM Loss often suffers from significant computational overhead and scalability limitations when applied to large-scale object spaces. To address this challenge, we propose novel loss formulations that align directly with ranking metrics: the Ranking-Generalizable extbf{squared} (RG$^2$) Loss and the Ranking-Generalizable interactive (RG$^ imes$) Loss, both derived through Taylor expansions of the SM Loss. Notably, RG$^2$ reveals the intrinsic mechanisms underlying weighted squared losses (WSL) in ranking methods and uncovers fundamental connections between sampling-based and non-sampling-based loss paradigms. Furthermore, we integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method, providing both generalization guarantees and convergence rate analyses. Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance relative to SM Loss, while significantly accelerating convergence. This framework offers the similarity learning community both theoretical insights and practically efficient tools, with methodologies applicable to a broad range of tasks where balancing ranking quality and computational efficiency is essential.
Problem

Research questions and friction points this paper is trying to address.

Address computational overhead in large-scale ranking tasks
Propose NDCG-consistent loss formulations for ranking
Integrate efficient optimization to accelerate convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel RG squared and interactive loss formulations
Taylor expansions of Softmax Loss derivation
ALS optimization for efficient convergence
🔎 Similar Papers
No similar papers found.
Yuanhao Pu
Yuanhao Pu
University of Science and Technology of China
Recommender SystemMachine LearningLearning Theory
D
Defu Lian
School of Computer Science & Technology, University of Science and Technology of China
X
Xiaolong Chen
School of Artificial Intelligence & Data Science, University of Science and Technology of China
X
Xu Huang
School of Computer Science & Technology, University of Science and Technology of China
J
Jin Chen
School of Business & Management, Hong Kong University of Science and Technology
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning