🤖 AI Summary
This work addresses the challenge of high-dimensional inference in nonlinear continuous stochastic tree processes, where sparse leaf-node observations and complex topology hinder accurate reconstruction. To this end, the authors propose NBFFG, a unified framework that introduces an auxiliary linear Gaussian process to construct a closed-form backward filter as a guiding mechanism, while modeling nonlinear deviations via neural residuals—implemented either as normalizing flows or controlled stochastic differential equations. This approach enables efficient path sampling and inference, and notably constitutes the first integration of auxiliary Gaussian processes with neural residuals for tree-structured stochastic processes. NBFFG supports unbiased path subsampling, reducing training complexity from scaling with the entire tree size to depending only on path length. Experiments demonstrate its superiority over existing methods on synthetic benchmarks and its successful application to high-dimensional phylogenetic analysis, accurately reconstructing ancestral butterfly wing morphologies.
📝 Abstract
Inference in non-linear continuous stochastic processes on trees is challenging, particularly when observations are sparse (leaf-only) and the topology is complex. Exact smoothing via Doob's $h$-transform is intractable for general non-linear dynamics, while particle-based methods degrade in high dimensions. We propose Neural Backward Filtering Forward Guiding (NBFFG), a unified framework for both discrete transitions and continuous diffusions. Our method constructs a variational posterior by leveraging an auxiliary linear-Gaussian process. This auxiliary process yields a closed-form backward filter that serves as a ``guide'', steering the generative path toward high-likelihood regions. We then learn a neural residual--parameterized as a normalizing flow or a controlled SDE--to capture the non-linear discrepancies. This formulation allows for an unbiased path-wise subsampling scheme, reducing the training complexity from tree-size dependent to path-length dependent. Empirical results show that NBFFG outperforms baselines on synthetic benchmarks, and we demonstrate the method on a high-dimensional inference task in phylogenetic analysis with reconstruction of ancestral butterfly wing shapes.