Maximizing Diversity in (near-)Median String Selection

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses the problem of generating multiple near-optimal median strings under Hamming distance that are also significantly distinct from one another, thereby enhancing flexibility and robustness in decision-making. For both sum dispersion and min dispersion diversity measures, the study presents the first exact algorithm tailored to the diameter-constrained variant and further develops a (1−ε)-approximation algorithm for sum dispersion and a bi-criteria approximation algorithm for min dispersion, enabling the generation of more than two diverse near-optimal solutions. By integrating structural properties of the Hamming median space with techniques from error-correcting code constructions, the proposed methods simultaneously guarantee solution quality and diversity under rigorous theoretical bounds, making them well-suited for applications in bioinformatics and pattern recognition.

Technology Category

Application Category

📝 Abstract

Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the underlying metric, this problem has extensive applications, ranging from bioinformatics to pattern recognition. However, modern applications often require the generation of multiple (near-)optimal yet diverse median strings to enhance flexibility and robustness in decision-making. In this study, we address this need by focusing on two prominent diversity measures: sum dispersion and min dispersion. We first introduce an exact algorithm for the diameter variant of the problem, which identifies pairs of near-optimal medians that are maximally diverse. Subsequently, we propose a $(1-\epsilon)$-approximation algorithm (for any $\epsilon>0$) for sum dispersion, as well as a bi-criteria approximation algorithm for the more challenging min dispersion case, allowing the generation of multiple (more than two) diverse near-optimal Hamming medians. Our approach primarily leverages structural insights into the Hamming median space and also draws on techniques from error-correcting code construction to establish these results.

Problem

Research questions and friction points this paper is trying to address.

median string

diversity maximization

Hamming distance

dispersion

string selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

median string

diversity maximization

Hamming distance