🤖 AI Summary
This paper addresses the efficient enumeration of missing subsequences in a string $w$, focusing on two fundamental classes: shortest missing subsequences (SMS) and minimal missing subsequences (MMS). Methodologically, we integrate subsequence automata, suffix links, and incremental edit-script encoding to design the first linear-time preprocessing algorithm with output-sensitive delay—i.e., delay proportional to the output length—and further develop a constant-delay incremental enumeration framework supporting real-time identification of the longest MMS. Our approach achieves theoretically optimal time complexity and delay guarantees. Experimental evaluation demonstrates that our method uniformly outperforms prior work in accuracy, efficiency, and scalability, and is the first to enable real-time, complete enumeration of missing subsequences over large-scale strings.
📝 Abstract
Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The existing literature on absent subsequences focused on understanding, for a string $w$, the set of its shortest absent subsequences (i.e., the shortest strings which are absent subsequences of $w$) and that of its minimal absent subsequences (i.e., those strings which are absent subsequences of $w$ but whose every proper subsequence occurs in $w$). Our contributions to this area of research are the following. Firstly, we present optimal algorithms (with linear time preprocessing and output-linear delay) for the enumeration of the shortest and, respectively, minimal absent subsequences. Secondly, we present optimal algorithms for the incremental enumeration of these strings with linear time preprocessing and constant delay; in this setting, we only output short edit-scripts showing how the currently enumerated string differs from the previous one. Finally, we provide an efficient algorithm for identifying a longest minimal absent subsequence of a string. All our algorithms improve the state-of-the-art results for the aforementioned problems.