๐ค AI Summary
This work addresses the limitations of large language models in million-scale candidate ranking, where performance is constrained by context length and computational cost. The authors propose LRanker, a novel framework that integrates K-means clustering with graph partitioning to construct candidate subsets and introduces a multi-query embedding generation and ensemble mechanism to enhance the modelโs perception capacity at inference time. By modeling global information and aggregating embeddings through ensemble strategies, LRanker significantly improves ranking accuracy and robustness. Evaluated on the RBench benchmark, the method achieves over 30% performance gains in small-scale settings, 3โ9% improvements in mean reciprocal rank (MRR) for million-scale scenarios, and maintains stable performance with 20โ30% gains even in ultra-large-scale setups involving more than 6.8 million candidates, demonstrating its effectiveness and strong scalability.
๐ Abstract
Large language models (LLMs) have recently shown strong potential for ranking by capturing semantic relevance and adapting across diverse domains, yet existing methods remain constrained by limited context length and high computational costs, restricting their applicability to real-world scenarios where candidate pools often scale to millions. To address this challenge, we propose LRanker, a framework tailored for large-candidate ranking. LRanker incorporates a candidate aggregation encoder that leverages K-means clustering to explicitly model global candidate information, and a graph-based test-time scaling mechanism that partitions candidates into subsets, generates multiple query embeddings, and integrates them through an ensemble procedure. By aggregating diverse embeddings instead of relying on a single representation, this mechanism enhances robustness and expressiveness, leading to more accurate ranking over massive candidate pools. We evaluate LRanker on seven tasks across three scenarios in RBench with different candidate scales. Experimental results show that LRanker achieves over 30% gains in the RBench-Small scenario, improves by 3-9% in MRR in the RBench-Large scenario, and sustains scalability with 20-30% improvements in the RBench-Ultra scenario with more than 6.8M candidates. Ablation studies further verify the effectiveness of its key components. Together, these findings demonstrate the robustness, scalability, and effectiveness of LRanker for massive-candidate ranking.