🤖 AI Summary
This paper investigates the structural sensitivity of the Compact Directed Acyclic Word Graph (CDAWG) to single-character left-end edits (insertion, deletion, substitution) and its implications for efficient long-text processing.
Method: Leveraging suffix tree isomorphism-based compression principles, combinatorial analysis, and constructive counterexamples, the authors analyze worst-case edge growth under prefix modifications.
Contribution/Results: The work establishes the first tight upper bound on edge增量: a left-end edit introduces at most as many new edges as the original CDAWG contains. Matching lower bounds are provided for insertion, and nearly matching ones for deletion and substitution. Furthermore, it proves an Ω(n²) time lower bound for left-to-right online CDAWG construction, refuting the existence of O(n)-time algorithms. This is the first systematic characterization of CDAWG’s worst-case behavior under prefix edits, providing foundational theoretical insights for dynamic text indexing design.
📝 Abstract
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ does not exceed $e$. Further, we present a matching lower bound on the sensitivity of CDAWGs for left-end insertions, and almost matching lower bounds for left-end deletions and substitutions. We then generalize our lower-bound instance for left-end insertions to leftward online construction of the CDAWG, and show that it requires $Omega(n^2)$ time for some string of length $n$.