On the Coverage Required for Diploid Genome Assembly

📅 2024-05-09
🏛️ International Symposium on Information Theory
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fundamental information-theoretic limits of complete diploid genome assembly, specifically establishing the first lower bound on the minimum required sequencing depth. It focuses on the critical bottleneck of resolving repetitive regions, particularly the challenge of “twin-repeat spanning.” Method: We systematically compare greedy assembly and de Bruijn graph–based approaches in terms of their coverage requirements and read-length sensitivity for traversing twin-repeat structures. Using information-theoretic modeling, solvability analysis, and algorithmic simulations, we quantify the excess coverage—i.e., redundancy—of state-of-the-art assemblers relative to the theoretical lower bound. Results: Our analysis reveals that current methods incur redundancy exceeding 100%—i.e., more than double the theoretically minimal coverage. Key contributions are: (1) the first information-theoretic lower bound for diploid genome assembly; (2) a rigorous demonstration that existing algorithms inherently introduce substantial redundancy due to unavoidable twin-repeat spanning; and (3) a principled theoretical benchmark guiding optimized sequencing strategies and next-generation assembler design.

Technology Category

Application Category

📝 Abstract
We investigate the information-theoretic conditions to achieve the complete reconstruction of a diploid genome. We also analyze the standard greedy and de-Bruijn graph-based algorithms and compare the coverage depth and read length requirements with the information-theoretic lower bound. Our results show that the gap between the two is considerable because both algorithms require the double repeats in the genome to be bridged.
Problem

Research questions and friction points this paper is trying to address.

Genome Assembly
Gene Duplication
Splicing Algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Genomic Fragmentation Conditions
Greedy Algorithm vs. De Bruijn Graph
Overlap Graph-Based Assembly
🔎 Similar Papers
No similar papers found.
D
Daanish Mahajan
Department of Interdisciplinary Mathematical Sciences, Indian Institute of Science, Bangalore 560012 India
Chirag Jain
Chirag Jain
Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012 India
N
Navin Kashyap
Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012 India