Looking for (Genomic) Needles in a Haystack: Sparsity-Driven Search for Identifying Correlated Genetic Mutations in Cancer

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the combinatorial explosion inherent in identifying high-order pathogenic multi-gene mutation combinations in cancer. The authors propose a sparsity-aware depth-first search framework (P-DFS) integrated with a weighted set cover model, which leverages the high sparsity of tumor mutation data to enable aggressive early pruning of the search space. By incorporating bitwise operations and a distributed algorithm design, the method reduces candidate combinations by 90–98% in 4-hit scenarios and achieves an approximately 183-fold speedup over exhaustive enumeration. This approach enables, for the first time, scalable and efficient identification of pathogenic mutation combinations at fourth order and beyond.

Technology Category

Application Category

📝 Abstract
Cancer typically arises not from a single genetic mutation (i.e., hit) but from multi-hit combinations that accumulate within cells. However, enumerating multi-hit combinations becomes exponentially more expensive computationally as the number of candidate hit gene combinations grow, i.e. on the order of 20,000 choose h, where 20,000 is the number of genes in the human genome and h is the number of hits. To address this challenge, we present an algorithmic framework, called Pruned Depth-First Search (P-DFS) that leverages the high sparsity in tumor mutation data to prune large portions of the search space. Specifically, P-DFS (the main contribution of this paper) - a pruning technique that exploits sparsity to drastically reduce the otherwise exponential h-hit search space for candidate combinations used by Weighted Set Cover - which is grounded in a depth-first search backtracking technique, prunes infeasible gene subsets early, while a weighted set cover formulation systematically scores and selects the most discriminative combinations. By intertwining these ideas with optimized bitwise operations and a scalable distributed algorithm on high-performance computing clusters, our algorithm can achieve approximately 90 - 98% reduction in visited combinations for 4-hits, and roughly a 183x speedup over the exhaustive set cover approach(which is algorithmically NP-complete) measured on 147,456 ranks. In doing so, our method can feasibly handle four-hit and even higher-order gene hits, achieving both speed and resource efficiency.
Problem

Research questions and friction points this paper is trying to address.

cancer
genetic mutations
multi-hit combinations
combinatorial explosion
tumor mutation data
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparsity-driven search
Pruned Depth-First Search
multi-hit genetic mutations
weighted set cover
high-performance computing
🔎 Similar Papers
No similar papers found.