Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prognostic biomarker discovery in high-dimensional multi-omics pancreatic cancer data suffers from the curse of dimensionality, poor stability across datasets, and arbitrary threshold dependence. Method: We propose a hybrid ensemble feature selection framework that integrates embedded (CoxLasso) and wrapper (survival SVM, random survival forest) approaches. A resampling-driven, multi-model–multi-subsample voting scheme enables robust feature ranking, while Pareto frontier analysis automatically determines the optimal feature set size—eliminating manual thresholding. Implemented efficiently via mlr3fselect, the method was validated across three independent pancreatic cancer cohorts. Results: It reduced biomarker counts by 62% on average, significantly improved stability (Jaccard similarity +0.31), and maintained predictive performance comparable to CoxLasso alone (ΔC-index < 0.02), thus balancing clinical interpretability with prediction reliability.

Technology Category

Application Category

📝 Abstract
Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.
Problem

Research questions and friction points this paper is trying to address.

Optimizing prognostic biomarker discovery in pancreatic cancer
Developing hybrid ensemble feature selection for multi-omics data
Balancing predictive accuracy and model sparsity without thresholds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid ensemble feature selection combining multiple strategies
Voting-theory aggregation for ranking omics features
Pareto front optimization balancing accuracy and sparsity
🔎 Similar Papers
No similar papers found.
J
John Zobolas
Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, University of Oslo, Oslo, Norway
Anne-Marie George
Anne-Marie George
University of Oslo
Computational Social ChoicePreference LearningVoting Theory
A
Alberto López
Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway; Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, University of Oslo, Oslo, Norway
Sebastian Fischer
Sebastian Fischer
HSHL
M
Marc Becker
Department of Statistics, Ludwig Maximilian University of Munich, Munich, Germany
Tero Aittokallio
Tero Aittokallio
University of Helsinki, University of Turku, University of Oslo
Systems medicine