Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Prognostic biomarker discovery in high-dimensional multi-omics pancreatic cancer data suffers from the curse of dimensionality, poor stability across datasets, and arbitrary threshold dependence. Method: We propose a hybrid ensemble feature selection framework that integrates embedded (CoxLasso) and wrapper (survival SVM, random survival forest) approaches. A resampling-driven, multi-model–multi-subsample voting scheme enables robust feature ranking, while Pareto frontier analysis automatically determines the optimal feature set size—eliminating manual thresholding. Implemented efficiently via mlr3fselect, the method was validated across three independent pancreatic cancer cohorts. Results: It reduced biomarker counts by 62% on average, significantly improved stability (Jaccard similarity +0.31), and maintained predictive performance comparable to CoxLasso alone (ΔC-index < 0.02), thus balancing clinical interpretability with prediction reliability.

Technology Category

Application Category

📝 Abstract
Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.
Problem

Research questions and friction points this paper is trying to address.

Optimizing prognostic biomarker discovery in pancreatic cancer
Developing hybrid ensemble feature selection for multi-omics data
Balancing predictive accuracy and model sparsity without thresholds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid ensemble feature selection combining multiple strategies
Voting-theory aggregation for ranking omics features
Pareto front optimization balancing accuracy and sparsity
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
J
John Zobolas
Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, University of Oslo, Oslo, Norway
Anne-Marie George
Anne-Marie George
University of Oslo
Computational Social ChoicePreference LearningVoting Theory
A
Alberto López
Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, University of Oslo, Oslo, Norway
Sebastian Fischer
Sebastian Fischer
HSHL
M
Marc Becker
Department of Statistics, Ludwig Maximilian University of Munich, Munich, Germany
Tero Aittokallio
Tero Aittokallio
University of Helsinki, University of Turku, University of Oslo
Systems medicine