🤖 AI Summary
This work addresses the security risks posed by regular expressions with backreferences, which can trigger super-linear backtracking and lead to denial-of-service (DoS) vulnerabilities—challenges that existing detection tools struggle to identify due to linear ambiguity. The study presents the first systematic investigation of this issue, introducing a two-phase memory finite automaton (2PMFA) model to formally characterize the semantics of such regexes and derive necessary conditions for super-linear backtracking. Building on this foundation, the authors develop algorithms for vulnerability detection and attack generation, uncovering three novel vulnerability patterns. Applied to the Snort rule set, the approach identifies 45 previously unknown vulnerabilities, enabling practical attacks that induce matching delays of 0.6–1.2 seconds while evading engine alert mechanisms, thereby substantially surpassing the limitations of current detection methodologies.
📝 Abstract
This paper presents the first systematic study of denial-of-service vulnerabilities in Regular Expressions with Backreferences (REwB). We introduce the Two-Phase Memory Automaton (2PMFA), an automaton model that precisely captures REwB semantics. Using this model, we derive necessary conditions under which backreferences induce super-linear backtracking runtime, even when sink ambiguity is linear -- a regime where existing detectors report no vulnerability. Based on these conditions, we identify three vulnerability patterns, develop detection and attack-construction algorithms, and validate them in practice. Using the Snort intrusion detection ruleset, our evaluation identifies 45 previously unknown REwB vulnerabilities with quadratic or worse runtime. We further demonstrate practical exploits against Snort, including slowing rule evaluation by 0.6-1.2 seconds and bypassing alerts by triggering PCRE's matching limit.