🤖 AI Summary
This paper addresses the novel problem of *efficiently identifying potentially trending items* in recommender systems and market analytics. We formally define item popularity scores based on the cardinality of reverse k-Maximum Inner Product Search (reverse k-MIPS) results and introduce the Top-N trending item mining task. To circumvent the prohibitive cost of computing all user-item inner products, we propose an adaptive upper-bound estimation framework coupled with candidate pruning, enabling both efficient filtering and precise ranking. Our method integrates reverse maximum inner product search, dynamic upper-bound optimization, and scalable vector retrieval techniques. Extensive experiments on multiple real-world datasets demonstrate that our algorithm achieves 10×–100× speedup over state-of-the-art baselines while maintaining 100% accuracy, and scales to real-time mining over million-scale users and items.
📝 Abstract
The $k$-MIPS ($k$ Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse $k$-MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such that their $k$-MIPS results contain the item vector. Consider the cardinality of a reverse $k$-MIPS result. A large cardinality means that the item is potentially popular, because it is included in the $k$-MIPS results of many users. This mining is important in recommender systems, market analysis, and new item development. Motivated by this, we formulate a new problem. In this problem, the score of each item is defined as the cardinality of its reverse $k$-MIPS result, and the $N$ items with the highest score are retrieved. A straightforward approach is to compute the scores of all items, but this is clearly prohibitive for large numbers of users and items. We remove this inefficiency issue and propose a fast algorithm for this problem. Because the main bottleneck of the problem is to compute the score of each item, we devise a new upper-bounding technique that is specific to our problem and filters unnecessary score computations. We conduct extensive experiments on real datasets and show the superiority of our algorithm over competitors.