🤖 AI Summary
This study aims to reconstruct the dynamic temporal sequence of post-transcriptional maturation of the *ndhB* and *ndhD* genes in *Arabidopsis thaliana* chloroplasts.
Method: We propose a three-stage causal temporal reconstruction framework tailored for long-read RNA sequencing data, grounded in Pearl’s causal inference theory. It integrates multiple causal discovery algorithms (HC, PC, LiNGAM, and NOTEARS), enhances NOTEARS regularization via stability selection, and jointly handles missing data and Bayesian network estimation using the EM algorithm.
Contribution/Results: The framework yields four high-fidelity, highly reliable maturation timelines that significantly outperform existing reference timelines. It generates experimentally testable intervention hypotheses—marking the first systematic application of causal inference to post-transcriptional regulatory timing modeling. This work establishes a novel, interpretable, and predictive paradigm for investigating RNA maturation mechanisms in plants.
📝 Abstract
We propose a novel framework for reconstructing the chronology of genetic regulation using causal inference based on Pearl's theory. The approach proceeds in three main stages: causal discovery, causal inference, and chronology construction. We apply it to the ndhB and ndhD genes of the chloroplast in Arabidopsis thaliana, generating four alternative maturation timeline models per gene, each derived from a different causal discovery algorithm (HC, PC, LiNGAM, or NOTEARS). Two methodological challenges are addressed: the presence of missing data, handled via an EM algorithm that jointly imputes missing values and estimates the Bayesian network, and the selection of the $ell_1$-regularization parameter in NOTEARS, for which we introduce a stability selection strategy. The resulting causal models consistently outperform reference chronologies in terms of both reliability and model fit. Moreover, by combining causal reasoning with domain expertise, the framework enables the formulation of testable hypotheses and the design of targeted experimental interventions grounded in theoretical predictions.