A Fast and Practical Column Generation Approach for Identifying Carcinogenic Multi-Hit Gene Combinations

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of identifying combinatorial multi-gene mutation patterns that drive cancer by formulating it as a Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), which seeks to maximize coverage of tumor samples while minimizing coverage of normal samples. To solve this computationally demanding problem, the authors propose a column-generation heuristic algorithm grounded in constraint programming and mixed-integer programming, substantially reducing computational complexity. The method achieves state-of-the-art performance on real cancer genomics data, delivering results comparable to existing top-tier approaches while requiring only a standard CPU and completing computations within one minute. Notably, it obtains provably optimal solutions on small-scale instances, thereby overcoming the traditional reliance on high-performance computing resources for such problems.

Technology Category

Application Category

📝 Abstract
Cancer is often driven by specific combinations of an estimated two to nine gene mutations, known as multi-hit combinations. Identifying these combinations is critical for understanding carcinogenesis and designing targeted therapies. We formalise this challenge as the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), a binary classification problem that selects gene combinations to maximise coverage of tumor samples while minimising coverage of normal samples. Existing approaches typically rely on exhaustive search and supercomputing infrastructure. In this paper, we present constraint programming and mixed integer programming formulations of the MHCDSCP. Evaluated on real-world cancer genomics data, our methods achieve performance comparable to state-of-the-art methods while running on a single commodity CPU in under a minute. Furthermore, we introduce a column generation heuristic capable of solving small instances to optimality. These results suggest that solving the MHCDSCP is less computationally intensive than previously believed, thereby opening research directions for exploring modelling assumptions.
Problem

Research questions and friction points this paper is trying to address.

multi-hit gene combinations
carcinogenesis
cancer driver identification
set cover problem
tumor genomics
Innovation

Methods, ideas, or system contributions that make the work stand out.

column generation
multi-hit gene combinations
mixed integer programming
constraint programming
cancer driver identification
🔎 Similar Papers
No similar papers found.
R
Rick S. H. Willemsen
Singapore University of Technology and Design, Engineering Systems and Design, Singapore
T
Tenindra Abeywickrama
RIKEN Center for Computational Science, Japan
Ramu Anandakrishnan
Ramu Anandakrishnan
Edward Via College of Osteopathic Medicine
Cancer GenomicsComputational BiologyBiophysics