🤖 AI Summary
Global network modeling of high-dimensional complex data often obscures scientifically meaningful relationships, particularly those involving target variables such as clinical outcomes or biomarkers.
Method: We propose a target-variable-oriented local graph estimation framework that infers only the neighborhood structure of the target variable—not the full graph—via a pathwise iterative feature selection algorithm. The method integrates statistical modeling with graph learning, accommodates mixed variable types and nonlinear dependencies, rigorously controls the false discovery rate under finite-sample settings, and enhances robustness through uncertainty propagation along selection paths.
Contribution/Results: Applied to two cancer cohort studies, our approach identified biologically plausible local networks, recapitulated known pathways, and uncovered novel associations. It substantially improves interpretability and scientific verifiability of results compared to global network methods, offering a principled solution for hypothesis generation in translational biomedical research.
📝 Abstract
Large, complex datasets often include a small set of variables of primary interest, such as clinical outcomes or known biomarkers, whose relation to the broader system is the main focus of analysis. In these situations, exhaustively estimating the entire network may obscure insights into the scientific question at hand. To address this common scenario, we introduce local graph estimation, a statistical framework that focuses on inferring substructures around target variables rather than recovering the full network of inter-variable relationships. We show that traditional graph estimation methods often fail to recover local structure, and present pathwise feature selection (PFS) as an alternative approach. PFS estimates local subgraphs by iteratively applying feature selection and propagating uncertainty along network paths. We prove that PFS provides path discovery with finite-sample false discovery control and yields highly interpretable results, even in settings with mixed variable types and nonlinear dependencies. Applied to two cancer studies -- one analyzing county-level cancer incidence and mortality across the U.S., and another integrating gene, microRNA, protein, and clinical data from The Cancer Genome Atlas -- PFS uncovers biologically plausible networks that reveal both known and novel associations.