Pattern Matching under Weighted Edit Distance

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies pattern matching under weighted edit distance (PMWED): given a pattern (P) of length (m), a text (T) of length (n), a threshold (k), and a character-dependent weight function assigning edit costs (geq 1) to insertions, deletions, and substitutions, the goal is to locate all starting positions in (T) of substrings obtainable from (P) via edits of total cost at most (k). PMWED generalizes unit-cost pattern matching (PMED) and better reflects real-world string similarity. We propose three novel algorithms: (1) a simple generic algorithm running in ( ilde{O}(nk)); (2) an optimized ( ilde{O}(n + k^{3.5} W^4 n/m)) algorithm for integer metric weights, where (W) bounds the maximum weight; and (3) an ( ilde{O}(n + k^4 n/m)) algorithm for arbitrary weights, approaching the optimal performance of PMED when weights are small integers. Our methods integrate dynamic programming, metric space assumptions, and oracle-based cost queries, significantly improving modeling fidelity and computational efficiency for practical applications.

Technology Category

Application Category

📝 Abstract
In Pattern Matching with Weighted Edits (PMWED), we are given a pattern $P$ of length $m$, a text $T$ of length $n$, a positive threshold $k$, and oracle access to a weight function that specifies the costs of edits (depending on the involved characters, and normalized so that the cost of each edit is at least $1$). The goal is to compute the starting positions of all fragments of $T$ that can be obtained from $P$ with edits of total cost at most $k$. PMWED captures typical real-world applications more accurately than its unweighted variant (PMED), where all edits have unit costs. We obtain three main results: (a) a conceptually simple $ ilde{O}(nk)$-time algorithm for PMWED, very different from that of Landau and Vishkin for PMED; (b) a significantly more complicated $ ilde{O}(n+k^{3.5} cdot W^4cdot n/m)$-time algorithm for PMWED under the assumption that the weight function is a metric with integer values between $0$ and $W$; and (c) an $ ilde{O}(n+k^4 cdot n/m)$-time algorithm for PMWED for the case of arbitrary weights. In the setting of metrics with small integer values, we nearly match the state of the art for PMED where $W=1$.
Problem

Research questions and friction points this paper is trying to address.

Computing substring matches with weighted edit cost constraints
Developing efficient algorithms for pattern matching with weighted edits
Improving accuracy over unweighted edit distance in real applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple algorithm with time complexity O(nk)
Metric-based algorithm with O(n+k^3.5*W^4*n/m) time
Arbitrary weight algorithm with O(n+k^4*n/m) time