Differentiable Fast Top-K Selection for Large-Scale Recommendation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Top-K selection is inherently non-differentiable, impeding end-to-end training of recommendation systems; existing differentiable approximations suffer from objective inconsistency, gradient conflict, and high computational complexity (e.g., O(n log n)). To address this, we propose DFTopK—the first theoretically optimal, O(n)-time differentiable Top-K operator, introducing strict linear-complexity differentiable Top-K to recommendation systems for the first time. Its core innovation lies in relaxing the normalization constraint to construct a differentiable approximation, solved via a closed-form analytical solution—bypassing soft permutation matrices and explicit sorting. Evaluated on public benchmarks and industrial deployments, DFTopK significantly improves training efficiency. Online A/B tests demonstrate a 1.77% revenue gain under identical computational overhead, validating its practical efficacy and scalability.

Technology Category

Application Category

📝 Abstract

Cascade ranking is a widely adopted paradigm in large-scale information retrieval systems for Top-K item selection. However, the Top-K operator is non-differentiable, hindering end-to-end training. Existing methods include Learning-to-Rank approaches (e.g., LambdaLoss), which optimize ranking metrics like NDCG and suffer from objective misalignment, and differentiable sorting-based methods (e.g., ARF, LCRON), which relax permutation matrices for direct Top-K optimization but introduce gradient conflicts through matrix aggregation. A promising alternative is to directly construct a differentiable approximation of the Top-K selection operator, bypassing the use of soft permutation matrices. However, even state-of-the-art differentiable Top-K operator (e.g., LapSum) require $O(n log n)$ complexity due to their dependence on sorting for solving the threshold. Thus, we propose DFTopK, a novel differentiable Top-K operator achieving optimal $O(n)$ time complexity. By relaxing normalization constraints, DFTopK admits a closed-form solution and avoids sorting. DFTopK also avoids the gradient conflicts inherent in differentiable sorting-based methods. We evaluate DFTopK on both the public benchmark RecFLow and an industrial system. Experimental results show that DFTopK significantly improves training efficiency while achieving superior performance, which enables us to scale up training samples more efficiently. In the online A/B test, DFTopK yielded a +1.77% revenue lift with the same computational budget compared to the baseline. To the best of our knowledge, this work is the first to introduce differentiable Top-K operators into recommendation systems and the first to achieve theoretically optimal linear-time complexity for Top-K selection. We have open-sourced our implementation to facilitate future research in both academia and industry.

Problem

Research questions and friction points this paper is trying to address.

Differentiable Top-K operator for end-to-end training in recommendation systems

Addresses gradient conflicts and high complexity in existing ranking methods

Achieves linear-time complexity for efficient large-scale Top-K selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable Top-K operator with linear complexity

Closed-form solution avoids sorting for efficiency

Eliminates gradient conflicts in ranking optimization

🔎 Similar Papers

A Comprehensive Survey on Retrieval Methods in Recommender Systems