🤖 AI Summary
This study addresses the challenge of predicting radiotherapy sensitivity in non-small cell lung cancer (NSCLC) by establishing, for the first time, an integrative transcriptomic (RNA-seq) and proteomic (DIA-MS) analytical framework using SF2—the surviving fraction after 2 Gy irradiation—as the phenotypic endpoint. Leveraging Lasso-based feature selection coupled with support vector regression (SVR), the model was optimized via ten repetitions of five-fold cross-validation to enhance robustness. The integrative model achieved stable predictive performance across both omics layers (R² = 0.461–0.604), outperforming unimodal models. It identified 20 consistently dysregulated cross-omics biomarker genes enriched in DNA damage repair and cellular stress response pathways. This work not only validates the complementary value of multi-omics integration for mechanistic insight and clinical translation but also establishes a generalizable paradigm for radiobiological sensitivity prediction.
📝 Abstract
To develop an integrated transcriptome-proteome framework for identifying concurrent biomarkers predictive of radiation response, as measured by survival fraction at 2 Gy (SF2), in non-small cell lung cancer (NSCLC) cell lines. RNA sequencing (RNA-seq) and data-independent acquisition mass spectrometry (DIA-MS) proteomic data were collected from 73 and 46 NSCLC cell lines, respectively. Following preprocessing, 1,605 shared genes were retained for analysis. Feature selection was performed using least absolute shrinkage and selection operator (Lasso) regression with a frequency-based ranking criterion under five-fold cross-validation repeated ten times. Support vector regression (SVR) models were constructed using transcriptome-only, proteome-only, and combined transcriptome-proteome feature sets. Model performance was assessed by the coefficient of determination (R2) and root mean square error (RMSE). Correlation analyses evaluated concordance between RNA and protein expression and the relationships of selected biomarkers with SF2. RNA-protein expression exhibited significant positive correlations (median Pearson's r = 0.363). Independent pipelines identified 20 prioritized gene signatures from transcriptomic, proteomic, and combined datasets. Models trained on single-omic features achieved limited cross-omic generalizability, while the combined model demonstrated balanced predictive accuracy in both datasets (R2=0.461, RMSE=0.120 for transcriptome; R2=0.604, RMSE=0.111 for proteome). This study presents the first proteotranscriptomic framework for SF2 prediction in NSCLC, highlighting the complementary value of integrating transcriptomic and proteomic data. The identified concurrent biomarkers capture both transcriptional regulation and functional protein activity, offering mechanistic insights and translational potential.