🤖 AI Summary
This study addresses key challenges in modeling complex diseases—namely high dimensionality, strong feature correlations, substantial noise, and scarcity of labeled samples—where existing methods often lack robustness, interpretability, and generalizability. To overcome these limitations, we propose a multi-stage soft computing framework that innovatively integrates single-cell transcriptomic analysis, high-dimensional weighted gene co-expression network analysis (hdWGCNA), two-dimensional disease mapping, CNN-based deep representation learning, molecular docking, and multi-model ensemble strategies to extract robust biomarkers from heterogeneous biological data and support therapeutic decision-making. The framework successfully identifies a cirrhosis-associated endothelial cell subpopulation and seven stable marker genes. Notably, the CNN module achieves superior classification performance compared to conventional approaches and demonstrates disease-agnostic applicability, offering broad potential for other omics-driven biomedical research contexts.
📝 Abstract
Liver cirrhosis is a major global health problem causing millions of deaths annually, and timely detection with aggressive treatment can significantly improve patients' quality of life. Modelling complex diseases from biomedical data is computationally challenging due to high dimensionality, strong feature correlations, noise, and limited labelled samples. Conventional Machine Learning (ML) pipelines often struggle with robustness, interpretability, and generalisation under such conditions. In this study, we propose an ML-driven multi-stage decision framework for complex disease modelling and therapeutic exploration. The framework integrates single-cell transcriptomic profiling, high-dimensional network-based feature stabilisation, multi-model learning, deep representation construction, and post-hoc decision support. Specifically, single-cell sequencing data were analysed to identify key cellular subpopulations, followed by high-dimensional weighted gene co-expression network analysis (hdWGCNA) to stabilise gene modules under sparsity and noise. To enhance non-linear feature interaction modelling, tabular molecular features were restructured into two-dimensional disease maps and analysed using a CNN. Finally, molecular docking was incorporated as a decision-support module to evaluate candidate therapeutic compounds. Using liver cirrhosis as a representative case, the framework identified a disease-associated endothelial subpopulation and extracted seven robust signature genes (HSPB1, GADD45A, CLDN5, ATP1B3, C1QBP, ENPP2, and PARL). The CNN-based representation learning module outperformed conventional pipelines in classification. The framework is disease-agnostic and readily extends to other omics-driven biomedical applications involving uncertainty, heterogeneity, and limited samples.