Beyond Pairwise Learning-To-Rank At Airbnb

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Learning-to-rank (LTR) systems face a fundamental trade-off among scalability, ranking accuracy, and total-order consistency—particularly in large-scale search settings. Method: This paper introduces the first scalable all-pairwise LTR framework, which jointly models both superiority and similarity relationships among *all* items on a search results page in a single inference pass—departing from conventional pairwise approaches. It incorporates a multi-dimensional preference quantification mechanism and theoretically circumvents logical inconsistencies arising from the ranking SAT theorem. Engineering innovations include structured sampling and incremental model updates to ensure production feasibility at scale. Contribution/Results: Deployed in Airbnb’s production search system, the framework achieves significant gains in NDCG@10, alongside measurable improvements in search conversion rate and user session duration—demonstrating simultaneous optimization of accuracy, consistency, and scalability.

Technology Category

Application Category

📝 Abstract
There are three fundamental asks from a ranking algorithm: it should scale to handle a large number of items, sort items accurately by their utility, and impose a total order on the items for logical consistency. But here's the catch-no algorithm can achieve all three at the same time. We call this limitation the SAT theorem for ranking algorithms. Given the dilemma, how can we design a practical system that meets user needs? Our current work at Airbnb provides an answer, with a working solution deployed at scale. We start with pairwise learning-to-rank (LTR) models-the bedrock of search ranking tech stacks today. They scale linearly with the number of items ranked and perform strongly on metrics like NDCG by learning from pairwise comparisons. They are at a sweet spot of performance vs. cost, making them an ideal choice for several industrial applications. However, they have a drawback-by ignoring interactions between items, they compromise on accuracy. To improve accuracy, we create a"true"pairwise LTR model-one that captures interactions between items during pairwise comparisons. But accuracy comes at the expense of scalability and total order, and we discuss strategies to counter these challenges. For greater accuracy, we take each item in the search result, and compare it against the rest of the items along two dimensions: (1) Superiority: How strongly do searchers prefer the given item over the remaining ones? (2) Similarity: How similar is the given item to all the other items? This forms the basis of our"all-pairwise"LTR framework, which factors in interactions across all items at once. Looking at items on the search result page all together-superiority and similarity combined-gives us a deeper understanding of what searchers truly want. We quantify the resulting improvements in searcher experience through offline and online experiments at Airbnb.
Problem

Research questions and friction points this paper is trying to address.

Balancing scalability, accuracy, and total order in ranking algorithms
Improving pairwise LTR models by capturing item interactions
Enhancing searcher experience with all-pairwise LTR framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed true pairwise LTR model
Incorporated item superiority and similarity
Scaled all-pairwise LTR framework
🔎 Similar Papers
No similar papers found.