🤖 AI Summary
This study addresses the challenge that real-world workflows in large-scale longitudinal systems generate data lacking formal temporal constraints, leading to an excessively large and empirically inconsistent causal graph search space. To resolve this, the authors propose a workflow-induced constraint framework that explicitly encodes workflow-consistent partial orderings through a structural mask derived from the workflow, a time-aligned indexing scheme, and a block-wise measurement structure. This approach enhances identifiability, interpretability, and consistency of causal structures in mixed discrete-continuous panel data without requiring new optimization algorithms. It further supports interventional queries and bootstrap-based uncertainty quantification for lagged effects. Applied to a Japanese health screening cohort of 107,261 individuals, the workflow-constrained longitudinal LiNGAM model uncovered temporally coherent substructures and lagged effects with well-calibrated uncertainties, with sensitivity analyses confirming result robustness.
📝 Abstract
Causal discovery has achieved substantial theoretical progress, yet its deployment in large-scale longitudinal systems remains limited. A key obstacle is that operational data are generated under institutional workflows whose induced partial orders are rarely formalized, enlarging the admissible graph space in ways inconsistent with the recording process. We characterize a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space through protocol-derived structural masks and timeline-aligned indexing. Rather than introducing a new optimization algorithm, we show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity, especially in mixed discrete--continuous panels where within-time orientation is weakly identified. The framework combines workflow-derived admissible-edge constraints, measurement-aligned time indexing and block structure, bootstrap-based uncertainty quantification for lagged total effects, and a dynamic representation supporting intervention queries. In a nationwide annual health screening cohort in Japan with 107,261 individuals and 429,044 person-years, workflow-constrained longitudinal LiNGAM yields temporally consistent within-time substructures and interpretable lagged total effects with explicit uncertainty. Sensitivity analyses using alternative exposure and body-composition definitions preserve the main qualitative patterns. We argue that formalizing workflow-derived constraint classes improves structural interpretability without relying on domain-specific edge specification, providing a reproducible bridge between operational workflows and longitudinal causal discovery under standard identifiability assumptions.