π€ AI Summary
This paper addresses the problem of counting distinct substrings of a string $w$ of length $n$ with respect to positional constraints: for each position $k$, compute $C(w,k)$, the number of distinct substrings containing $k$, and $N(w,k)$, the number not containing $k$. We propose the first linear-time algorithms for both quantities. Specifically, we compute all $C(w,k)$ exactly in $O(n)$ total time over general ordered alphabets; for $N(w,k)$, we achieve $O(n)$ time under the assumption that the alphabet is linearly sortable (e.g., integer alphabets). Our approach integrates suffix arrays, LCP arrays, and sweep-line techniques, with data structure operations optimized according to alphabet properties. This improves upon the naive $O(n^2)$ enumeration-based methods by an order-of-magnitude speedup, enabling efficient position-sensitive substring analysis in large-scale string processing.
π Abstract
Let $w$ be a string of length $n$. The problem of counting factors crossing a position - Problem 64 from the textbook ``125 Problems in Text Algorithms'' [Crochemore, Leqroc, and Rytter, 2021], asks to count the number $mathcal{C}(w,k)$ (resp. $mathcal{N}(w,k)$) of distinct substrings in $w$ that have occurrences containing (resp. not containing) a position $k$ in $w$. The solutions provided in their textbook compute $mathcal{C}(w,k)$ and $mathcal{N}(w,k)$ in $O(n)$ time for a single position $k$ in $w$, and thus a direct application would require $O(n^2)$ time for all positions $k = 1, ldots, n$ in $w$. Their solution is designed for constant-size alphabets. In this paper, we present new algorithms which compute $mathcal{C}(w,k)$ in $O(n)$ total time for general ordered alphabets, and $mathcal{N}(w,k)$ in $O(n)$ total time for linearly sortable alphabets, for all positions $k = 1, ldots, n$ in $w$.