Efficiently Finding All Minimal and Shortest Absent Subsequences in a String

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the efficient enumeration of missing subsequences in a string $w$, focusing on two fundamental classes: shortest missing subsequences (SMS) and minimal missing subsequences (MMS). Methodologically, we integrate subsequence automata, suffix links, and incremental edit-script encoding to design the first linear-time preprocessing algorithm with output-sensitive delay—i.e., delay proportional to the output length—and further develop a constant-delay incremental enumeration framework supporting real-time identification of the longest MMS. Our approach achieves theoretically optimal time complexity and delay guarantees. Experimental evaluation demonstrates that our method uniformly outperforms prior work in accuracy, efficiency, and scalability, and is the first to enable real-time, complete enumeration of missing subsequences over large-scale strings.

Technology Category

Application Category

📝 Abstract
Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The existing literature on absent subsequences focused on understanding, for a string $w$, the set of its shortest absent subsequences (i.e., the shortest strings which are absent subsequences of $w$) and that of its minimal absent subsequences (i.e., those strings which are absent subsequences of $w$ but whose every proper subsequence occurs in $w$). Our contributions to this area of research are the following. Firstly, we present optimal algorithms (with linear time preprocessing and output-linear delay) for the enumeration of the shortest and, respectively, minimal absent subsequences. Secondly, we present optimal algorithms for the incremental enumeration of these strings with linear time preprocessing and constant delay; in this setting, we only output short edit-scripts showing how the currently enumerated string differs from the previous one. Finally, we provide an efficient algorithm for identifying a longest minimal absent subsequence of a string. All our algorithms improve the state-of-the-art results for the aforementioned problems.
Problem

Research questions and friction points this paper is trying to address.

Finding all minimal absent subsequences in a string efficiently
Enumerating shortest absent subsequences with optimal algorithms
Identifying longest minimal absent subsequence of a string
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear time preprocessing for subsequence enumeration
Output-linear delay in minimal subsequence algorithms
Efficient longest minimal subsequence identification
🔎 Similar Papers
No similar papers found.
Florin Manea
Florin Manea
University of Göttingen
Theoretical Computer Science
T
Tina Ringleb
Department of Computer Science, University of Göttingen, Germany
S
Stefan Siemer
Department of Computer Science, University of Göttingen, Germany
M
Maximilian Winkler
Department of Computer Science, University of Göttingen, Germany