🤖 AI Summary
This paper studies pattern matching under weighted edit distance (PMWED): given a pattern (P) of length (m), a text (T) of length (n), a threshold (k), and a character-dependent weight function assigning edit costs (geq 1) to insertions, deletions, and substitutions, the goal is to locate all starting positions in (T) of substrings obtainable from (P) via edits of total cost at most (k). PMWED generalizes unit-cost pattern matching (PMED) and better reflects real-world string similarity. We propose three novel algorithms: (1) a simple generic algorithm running in ( ilde{O}(nk)); (2) an optimized ( ilde{O}(n + k^{3.5} W^4 n/m)) algorithm for integer metric weights, where (W) bounds the maximum weight; and (3) an ( ilde{O}(n + k^4 n/m)) algorithm for arbitrary weights, approaching the optimal performance of PMED when weights are small integers. Our methods integrate dynamic programming, metric space assumptions, and oracle-based cost queries, significantly improving modeling fidelity and computational efficiency for practical applications.
📝 Abstract
In Pattern Matching with Weighted Edits (PMWED), we are given a pattern $P$ of length $m$, a text $T$ of length $n$, a positive threshold $k$, and oracle access to a weight function that specifies the costs of edits (depending on the involved characters, and normalized so that the cost of each edit is at least $1$). The goal is to compute the starting positions of all fragments of $T$ that can be obtained from $P$ with edits of total cost at most $k$. PMWED captures typical real-world applications more accurately than its unweighted variant (PMED), where all edits have unit costs.
We obtain three main results:
(a) a conceptually simple $ ilde{O}(nk)$-time algorithm for PMWED, very different from that of Landau and Vishkin for PMED;
(b) a significantly more complicated $ ilde{O}(n+k^{3.5} cdot W^4cdot n/m)$-time algorithm for PMWED under the assumption that the weight function is a metric with integer values between $0$ and $W$; and
(c) an $ ilde{O}(n+k^4 cdot n/m)$-time algorithm for PMWED for the case of arbitrary weights.
In the setting of metrics with small integer values, we nearly match the state of the art for PMED where $W=1$.