Efficiently Finding All Minimal and Shortest Absent Subsequences in a String

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This paper addresses the efficient enumeration of missing subsequences in a string $w$, focusing on two fundamental classes: shortest missing subsequences (SMS) and minimal missing subsequences (MMS). Methodologically, we integrate subsequence automata, suffix links, and incremental edit-script encoding to design the first linear-time preprocessing algorithm with output-sensitive delay—i.e., delay proportional to the output length—and further develop a constant-delay incremental enumeration framework supporting real-time identification of the longest MMS. Our approach achieves theoretically optimal time complexity and delay guarantees. Experimental evaluation demonstrates that our method uniformly outperforms prior work in accuracy, efficiency, and scalability, and is the first to enable real-time, complete enumeration of missing subsequences over large-scale strings.

Technology Category

Application Category

📝 Abstract

Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The existing literature on absent subsequences focused on understanding, for a string $w$, the set of its shortest absent subsequences (i.e., the shortest strings which are absent subsequences of $w$) and that of its minimal absent subsequences (i.e., those strings which are absent subsequences of $w$ but whose every proper subsequence occurs in $w$). Our contributions to this area of research are the following. Firstly, we present optimal algorithms (with linear time preprocessing and output-linear delay) for the enumeration of the shortest and, respectively, minimal absent subsequences. Secondly, we present optimal algorithms for the incremental enumeration of these strings with linear time preprocessing and constant delay; in this setting, we only output short edit-scripts showing how the currently enumerated string differs from the previous one. Finally, we provide an efficient algorithm for identifying a longest minimal absent subsequence of a string. All our algorithms improve the state-of-the-art results for the aforementioned problems.

Problem

Research questions and friction points this paper is trying to address.

Finding all minimal absent subsequences in a string efficiently

Enumerating shortest absent subsequences with optimal algorithms

Identifying longest minimal absent subsequence of a string

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear time preprocessing for subsequence enumeration

Output-linear delay in minimal subsequence algorithms

Efficient longest minimal subsequence identification

🔎 Similar Papers

On Computing the Smallest Suffixient Set