Query Answering under Volume-Based Diversity Functions

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When query results contain excessively many tuples, selecting a diverse subset is challenging: existing distance-aggregation-based diversity measures exhibit counterintuitive behavior, and computing the maximum-diversity subset is NP-hard, with weak theoretical guarantees for approximation algorithms. Method: This paper proposes a novel volume-based diversity measurement framework—replacing conventional pairwise distance metrics with geometric volume to quantify tuple dissimilarity—and establishes a theoretical connection between volume diversity and ranked enumeration. Contribution/Results: We prove that, under any such volume-based diversity function, the greedy algorithm achieves a (1−1/e)-approximation ratio to the optimal solution. The framework enables polynomial-time, tractable computation and unifies relational data modeling, complexity analysis of conjunctive queries, and design of efficient algorithms.

Technology Category

Application Category

📝 Abstract
When query evaluation produces too many tuples, a new approach in query answering is to retrieve a diverse subset of them. The standard approach for measuring the diversity of a set of tuples is to use a distance function between tuples, which measures the dissimilarity between them, to then aggregate the pairwise distances of the set into a score (e.g., by using sum or min aggregation). However, as we will point out in this work, the resulting diversity measures may display some unintuitive behavior. Moreover, even in very simple settings, finding a maximally diverse subset of the answers of fixed size is, in general, intractable and little is known about approximations apart from some hand-picked distance-aggregator pairs. In this work, we introduce a novel approach for computing the diversity of tuples based on volume instead of distance. We present a framework for defining volume-based diversity functions and provide several examples of these measures applied to relational data. Although query answering of conjunctive queries (CQ) under this setting is intractable in general, we show that one can always compute a (1-1/e)-approximation for any volume-based diversity function. Furthermore, in terms of combined complexity, we connect the evaluation of CQs under volume-based diversity functions with the ranked enumeration of solutions, finding general conditions under which a (1-1/e)-approximation can be computed in polynomial time.
Problem

Research questions and friction points this paper is trying to address.

Addressing unintuitive behavior of distance-based diversity measures
Introducing volume-based diversity functions for query result diversification
Providing approximation algorithms for tractable diverse query answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Volume-based diversity functions replace distance metrics
Framework defines volume measures for relational data
Polynomial-time (1-1/e)-approximation algorithm developed
🔎 Similar Papers
No similar papers found.