Access Paths for Efficient Ordering with Large Language Models

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the challenge of efficiently implementing the ORDER BY operator in large language models (LLMs). We propose the first LLM-specific sorting logic abstraction and unified evaluation framework. Methodologically, we introduce a consensus-based batching strategy, a majority-voting-driven pairwise comparison mechanism, and a bidirectional external merge sort tailored to LLM inference characteristics; these are augmented with value-based sorting and batch-size adaptation, and rigorously evaluated on models including GPT-4o. Experiments across multiple datasets and models demonstrate significant improvements in both sorting accuracy and throughput efficiency. Moreover, we uncover, for the first time, a logarithmic-linear trade-off between computational cost and sorting quality—enabling superior cost–quality balance. Our framework establishes foundational principles for scalable, accurate, and efficient sorting in LLM-driven database systems.

Technology Category

Application Category

📝 Abstract

We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework. Our experiments show that no single approach is universally optimal, with effectiveness depending on query characteristics and data. We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs. With extensive experiments, our agreement-based procedure is effective at determining batch size for value-based methods, the majority-voting mechanism consistently strengthens pairwise comparisons on GPT-4o, and external merge sort achieves high accuracy-efficiency trade-offs across datasets and models. We further observe a log-linear scaling between compute cost and ordering quality, offering the first step toward principled cost models for LLM powered data systems.

Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM-based ORDER BY operator implementations

Evaluating effectiveness across varying queries and data

Developing cost-efficient scaling for LLM ordering quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agreement-based batch-size policy for value methods

Majority voting mechanism for pairwise comparisons

Two-way external merge sort adaptation for LLMs

🔎 Similar Papers

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy