🤖 AI Summary
This study investigates the sequence reconstruction capability—formally, the channel capacity—of molecular information storage and protein identification systems modeled by sequences of multiple colored channels. By establishing, for the first time, an equivalence between the capacity of such colored-channel sequences and the properties of their associated “pairs graph,” the problem is recast within a graph-theoretic framework. Leveraging tools from information theory, combinatorics, and graph theory, the paper precisely characterizes the capacities of canonical structures including uniform sunflowers, pairwise intersecting sets, and paths. When the alphabet size is four, the exact capacity is determined for nearly all sequences, with only the four-cycle case remaining bounded by tight upper and lower limits. Additionally, tailored tight bounds are derived for cyclic structures.
📝 Abstract
A single coloring channel is defined by a subset of letters it allows to pass through, while deleting all others. A sequence of coloring channels provides multiple views of the same transmitted letter sequence, forming a type of sequence-reconstruction problem useful for protein identification and information storage at the molecular level. We provide exact capacities of several sequences of coloring channels: uniform sunflowers, two arbitrary intersecting sets, and paths. We also show how this capacity depends solely on a related graph we define, called the pairs graph. Using this equivalence, we prove lower and upper bounds on the capacity, and a tailored bound for a coloring-channel sequence forming a cycle. In particular, for an alphabet of size $4$, these results give the exact capacity of all coloring-channel sequences except for a cycle of length $4$, for which we only provide bounds.