Personalized Top-k Set Queries Over Predicted Scores

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses the problem of efficiently computing personalized Top-k queries over multimodal data while minimizing costly invocations of external oracles—such as large language models (LLMs)—without compromising result accuracy. To this end, we propose the first generic framework supporting arbitrary decomposable set-scoring functions. Our method introduces an uncertainty-aware probabilistic model that enables optimal dynamic scheduling of subproblems and active selection of oracle queries. The core contribution lies in the tight integration of set-score decomposition, probabilistic inference, and LLM API invocation optimization, thereby jointly improving both precision and efficiency. Extensive experiments on three large-scale multimodal datasets demonstrate that our approach reduces LLM calls by over 90% compared to baseline methods while achieving 100% Top-k result accuracy. Moreover, the framework scales to candidate sets containing millions of items, confirming its practicality and efficiency.

Technology Category

Application Category

📝 Abstract

This work studies the applicability of expensive external oracles such as large language models in answering top-k queries over predicted scores. Such scores are incurred by user-defined functions to answer personalized queries over multi-modal data. We propose a generic computational framework that handles arbitrary set-based scoring functions, as long as the functions could be decomposed into constructs, each of which sent to an oracle (in our case an LLM) to predict partial scores. At a given point in time, the framework assumes a set of responses and their partial predicted scores, and it maintains a collection of possible sets that are likely to be the true top-k. Since calling oracles is costly, our framework judiciously identifies the next construct, i.e., the next best question to ask the oracle so as to maximize the likelihood of identifying the true top-k. We present a principled probabilistic model that quantifies that likelihood. We study efficiency opportunities in designing algorithms. We run an evaluation with three large scale datasets, scoring functions, and baselines. Experiments indicate the efficacy of our framework, as it achieves an order of magnitude improvement over baselines in requiring LLM calls while ensuring result accuracy. Scalability experiments further indicate that our framework could be used in large-scale applications.

Problem

Research questions and friction points this paper is trying to address.

Top-k queries over predicted scores

Personalized queries on multi-modal data

Efficient use of expensive oracles

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based partial score prediction

Probabilistic top-k identification model

Efficient LLM call optimization

🔎 Similar Papers

GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation