On the Coverage Required for Diploid Genome Assembly

📅 2024-05-09

🏛️ International Symposium on Information Theory

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study addresses the fundamental information-theoretic limits of complete diploid genome assembly, specifically establishing the first lower bound on the minimum required sequencing depth. It focuses on the critical bottleneck of resolving repetitive regions, particularly the challenge of “twin-repeat spanning.” Method: We systematically compare greedy assembly and de Bruijn graph–based approaches in terms of their coverage requirements and read-length sensitivity for traversing twin-repeat structures. Using information-theoretic modeling, solvability analysis, and algorithmic simulations, we quantify the excess coverage—i.e., redundancy—of state-of-the-art assemblers relative to the theoretical lower bound. Results: Our analysis reveals that current methods incur redundancy exceeding 100%—i.e., more than double the theoretically minimal coverage. Key contributions are: (1) the first information-theoretic lower bound for diploid genome assembly; (2) a rigorous demonstration that existing algorithms inherently introduce substantial redundancy due to unavoidable twin-repeat spanning; and (3) a principled theoretical benchmark guiding optimized sequencing strategies and next-generation assembler design.

Technology Category

Application Category

📝 Abstract

We investigate the information-theoretic conditions to achieve the complete reconstruction of a diploid genome. We also analyze the standard greedy and de-Bruijn graph-based algorithms and compare the coverage depth and read length requirements with the information-theoretic lower bound. Our results show that the gap between the two is considerable because both algorithms require the double repeats in the genome to be bridged.

Problem

Research questions and friction points this paper is trying to address.

Genome Assembly

Gene Duplication

Splicing Algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Genomic Fragmentation Conditions

Greedy Algorithm vs. De Bruijn Graph

Overlap Graph-Based Assembly

🔎 Similar Papers

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences