🤖 AI Summary
This paper investigates the complexity separation between exact and approximate solutions for Color Distance Oracle (CDO) and nearest-pattern queries over text fragments. Prior work suffers from loose upper and lower bounds.
Method: We propose the first conditionally optimal approximation framework, leveraging recent advances in fast matrix multiplication and set-disjointness data structures to achieve a preprocessing–query time trade-off of $O(n^a)$ and $O(n^b)$, respectively.
Contribution/Results: Under the $omega = 2$ hypothesis, our framework attains tight trade-offs $a + 2b = 2$ or $2a + b = 3$. Moreover, under the Strong APSP Hypothesis, we prove that exact CDO is strictly harder than its approximation—establishing a conditional tight lower bound for exact solutions. This work provides the first rigorous separation demonstrating a fundamental dichotomy in approximability and tractability between two classical distance query problems.
📝 Abstract
In the snippets problem, the goal is to preprocess text $T$ so that given two patterns $P_1$ and $P_2$, one can locate the occurrences of the two patterns in $T$ that are closest to each other, or report their distance. Kopelowitz and Krauthgamer [CPM2016] showed upper bound tradeoffs and conditional lower bounds tradeoffs for the snippets problem, by utilizing connections between the snippets problem and the problem of constructing a color distance oracle (CDO), which is a data structure that preprocess a set of points with associated colors so that given two colors $c$ and $c'$ one can quickly find the (distance between the) closest pair of points with colors $c$ and $c'$. However, the existing upper bound and lower bound curves are not tight.
Inspired by recent advances by Kopelowitz and Vassilevska-Williams [ICALP2020] regarding Set-disjointness data structures, we introduce new conditionally optimal algorithms for $(1+varepsilon)$ approximation versions of the snippets problem and the CDO problem, by applying fast matrix multiplication. For example, for CDO on $n$ points in an array with preprocessing time $ ilde{O}(n^a)$ and query time $ ilde{O}(n^b)$, assuming that $ω=2$ (where $ω$ is the exponent of $n$ in the runtime of the fastest matrix multiplication algorithm on two squared matrices of size $n imes n$), we show that approximate CDO can be solved with the following tradeoff
$$ a + 2b = 2 ext{ if } 0 leq b leq frac1 3$$ $$ 2a + b = 3 ext{ if } frac13leq b leq 1.$$
Moreover, we prove that for exact CDO on points in an array, the algorithm of Kopelowitz and Krauthgamer [CPM2016], is essentially optimal assuming that the strong APSP hypothesis holds for randomized algorithms. Thus, the exact version of CDO is strictly harder than the approximate version.