🤖 AI Summary
Quantifying uncertainty and controlling false discoveries in computational biomarker identification from short-term longitudinal omics data remains challenging. Method: We propose a dual-path Bayesian variable selection framework: (i) first-order differencing (Δ) to model dynamic changes, and (ii) integration of Zellner’s g-prior—tuned adaptively via Stein’s Unbiased Risk Estimate (SURE)—with a Bayesian group LASSO–spike-and-slab mixture prior to jointly enforce group-level sparsity and precise metabolite-level selection. Contribution/Results: This is the first method to combine SURE-driven g-priors with multivariate heavy-tailed group-sparse priors, enabling fully automated hyperparameter selection. In simulations and tuberculosis metabolomics data, it significantly outperforms linear mixed-effects models, achieving high sensitivity and specificity while robustly identifying target metabolites. The approach provides an interpretable, false-discovery-controlled paradigm for clinical translation of biomarkers.
📝 Abstract
Clinical investigators are increasingly interested in discovering computational biomarkers from short-term longitudinal omics data sets. This work focuses on Bayesian regression and variable selection for longitudinal omics datasets, which can quantify uncertainty and control false discovery. In our univariate approach, Zellner's $g$ prior is used with two different options of the tuning parameter $g$: $g=sqrt{n}$ and a $g$ that minimizes Stein's unbiased risk estimate (SURE). Bayes Factors were used to quantify uncertainty and control for false discovery. In the multivariate approach, we use Bayesian Group LASSO with a spike and slab prior for group variable selection. In both approaches, we use the first difference ($Delta$) scale of longitudinal predictor and the response. These methods work together to enhance our understanding of biomarker identification, improving inference and prediction. We compare our method against commonly used linear mixed effect models on simulated data and real data from a Tuberculosis (TB) study on metabolite biomarker selection. With an automated selection of hyperparameters, the Zellner's $g$ prior approach correctly identifies target metabolites with high specificity and sensitivity across various simulation and real data scenarios. The Multivariate Bayesian Group Lasso spike and slab approach also correctly selects target metabolites across various simulation scenarios.