Color Distance Oracles and Snippets: Separation Between Exact and Approximate Solutions

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the complexity separation between exact and approximate solutions for Color Distance Oracle (CDO) and nearest-pattern queries over text fragments. Prior work suffers from loose upper and lower bounds. Method: We propose the first conditionally optimal approximation framework, leveraging recent advances in fast matrix multiplication and set-disjointness data structures to achieve a preprocessing–query time trade-off of $O(n^a)$ and $O(n^b)$, respectively. Contribution/Results: Under the $omega = 2$ hypothesis, our framework attains tight trade-offs $a + 2b = 2$ or $2a + b = 3$. Moreover, under the Strong APSP Hypothesis, we prove that exact CDO is strictly harder than its approximation—establishing a conditional tight lower bound for exact solutions. This work provides the first rigorous separation demonstrating a fundamental dichotomy in approximability and tractability between two classical distance query problems.

Technology Category

Application Category

📝 Abstract
In the snippets problem, the goal is to preprocess text $T$ so that given two patterns $P_1$ and $P_2$, one can locate the occurrences of the two patterns in $T$ that are closest to each other, or report their distance. Kopelowitz and Krauthgamer [CPM2016] showed upper bound tradeoffs and conditional lower bounds tradeoffs for the snippets problem, by utilizing connections between the snippets problem and the problem of constructing a color distance oracle (CDO), which is a data structure that preprocess a set of points with associated colors so that given two colors $c$ and $c'$ one can quickly find the (distance between the) closest pair of points with colors $c$ and $c'$. However, the existing upper bound and lower bound curves are not tight. Inspired by recent advances by Kopelowitz and Vassilevska-Williams [ICALP2020] regarding Set-disjointness data structures, we introduce new conditionally optimal algorithms for $(1+varepsilon)$ approximation versions of the snippets problem and the CDO problem, by applying fast matrix multiplication. For example, for CDO on $n$ points in an array with preprocessing time $ ilde{O}(n^a)$ and query time $ ilde{O}(n^b)$, assuming that $ω=2$ (where $ω$ is the exponent of $n$ in the runtime of the fastest matrix multiplication algorithm on two squared matrices of size $n imes n$), we show that approximate CDO can be solved with the following tradeoff $$ a + 2b = 2 ext{ if } 0 leq b leq frac1 3$$ $$ 2a + b = 3 ext{ if } frac13leq b leq 1.$$ Moreover, we prove that for exact CDO on points in an array, the algorithm of Kopelowitz and Krauthgamer [CPM2016], is essentially optimal assuming that the strong APSP hypothesis holds for randomized algorithms. Thus, the exact version of CDO is strictly harder than the approximate version.
Problem

Research questions and friction points this paper is trying to address.

Develop efficient algorithms for approximate snippets problem
Optimize color distance oracle with matrix multiplication
Prove exact CDO is harder than approximate CDO
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses fast matrix multiplication for approximation
Introduces conditionally optimal algorithms for snippets
Proves exact CDO is harder than approximate
🔎 Similar Papers
No similar papers found.
N
Noam Horowicz
Bar Ilan University, Israel
Tsvi Kopelowitz
Tsvi Kopelowitz
Bar Ilan University