🤖 AI Summary
For diseases such as hepatocellular carcinoma—where no single ideal biomarker exists—existing diagnostic models suffer from limited performance under skewed biomarker distributions, small inter-group differences, or insufficient sample sizes. To address this, we propose a parametric likelihood ratio–based multimarker diagnostic model: (i) it explicitly models the likelihood ratio as an interpretable diagnostic accuracy metric (e.g., sensitivity, specificity); (ii) it enables robust inference under missing data; and (iii) it incorporates a resource-aware biomarker selection mechanism. The method integrates multivariate statistical modeling, likelihood ratio optimization, and diagnostic evaluation (AUC, ROC analysis). Validated via extensive simulations and real clinical datasets, it significantly outperforms state-of-the-art classification and discriminant methods—particularly in low-sample-size and incomplete-data settings. An open-source R package ensures full reproducibility of results.
📝 Abstract
Accurate diagnostic tests are crucial to ensure effective treatment, screening, and surveillance of diseases. However, the limited accuracy of individual biomarkers often hinders comprehensive screening. The heterogeneity of many diseases, particularly cancer, calls for the use of several biomarkers together into a composite diagnostic test. In this paper, we present a novel multivariate model that optimally combines multiple biomarkers using the likelihood ratio function. The model's parameters directly translate into computationally simple diagnostic accuracy measures. Additionally, our method allows for reliable predictions even in scenarios where specific biomarker measurements are unavailable and can guide the selection of biomarker combinations under resource constraints. We conduct simulation studies to compare the performance to popular classification and discriminant analysis methods. We utilize the approach to construct an optimal diagnostic test for hepatocellular carcinoma, a cancer type known for the absence of a single ideal marker. An accompanying R implementation is made available for reproducing all results.