Hardness of Regular Expression Matching with Extensions

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study investigates the computational complexity of matching extended regular expressions (EREs) that incorporate features such as backreferences, intersection, and complementation. Leveraging the Orthogonal Vectors conjecture and the k-Clique hypothesis, the work establishes the first conditional lower bounds for ERE matching, demonstrating that it cannot be solved in near-linear time under standard computational assumptions. Through a combination of fine-grained reductions, conditional complexity analysis, and algebraic techniques tied to the matrix multiplication exponent ω, the authors show that the current best-known algorithm—running in O(n^ω poly(m)) time—is nearly optimal. This result provides a theoretical explanation for the four-decade-long stagnation in improving the time complexity of this fundamental problem.

Technology Category

Application Category

📝 Abstract

The regular expression matching problem asks whether a given regular expression of length $m$ matches a given string of length $n$. As is well known, the problem can be solved in $O(nm)$ time using Thompson's algorithm. Moreover, recent studies have shown that the matching problem for regular expressions extended with a practical extension called lookaround can be solved in the same time complexity. In this work, we consider three well-known extensions to regular expressions called backreference, intersection and complement, and we show that, unlike in the case of lookaround, the matching problem for regular expressions extended with any of the three (for backreference, even when restricted to one capturing group) cannot be solved in $O(n^{2-\varepsilon} \mathrm{poly}(m))$ time for any constant $\varepsilon>0$ under the Orthogonal Vectors Conjecture. Moreover, we study the matching problem for regular expressions extended with complement in more detail, which is also known as extended regular expression (ERE) matching. We show that there is no ERE matching algorithm that runs in $O(n^{\omega-\varepsilon} \mathrm{poly}(m))$ time ($2 \le \omega<2.3716$ is the exponent of square matrix multiplication) for any constant $\varepsilon>0$ under the $k$-Clique Hypothesis, and there is no combinatorial ERE matching algorithm that runs in $O(n^{3-\varepsilon} \mathrm{poly}(m))$ time for any constant $\varepsilon>0$ under the Combinatorial $k$-Clique Hypothesis. This shows that the $O(n^3 m)$-time algorithm introduced by Hopcroft and Ullman in 1979 and recently improved by Bille et al. to run in $O(n^\omega m)$ time using fast matrix multiplication was already optimal in a sense, and sheds light on why the theoretical computer science community has struggled to improve the time complexity of ERE matching with respect to $n$ and $m$ for more than 45 years.

Problem

Research questions and friction points this paper is trying to address.

regular expression matching

backreference

intersection

complement

extended regular expression

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional lower bounds

extended regular expressions

backreference