π€ AI Summary
Existing methods struggle to identify the joint distribution of hierarchical latent variables from single-step observations, hindering modeling of multi-level abstract dynamics underlying time series. To address this, we propose CHiLDβa novel framework that achieves identifiability of hierarchical latent structure for the first time using only three conditionally independent observations, augmented with sparsity constraints to ensure layer-wise uniqueness. CHiLD employs variational inference to construct a generative model: a context encoder jointly reconstructs multi-layer latent variables, while a flow-based normalization prior network enforces noise independence across layers. We provide theoretical guarantees of identifiability under mild assumptions. Experiments on synthetic and real-world datasets demonstrate that CHiLD accurately recovers hierarchical causal structures, significantly improving temporal dependency modeling and yielding higher-fidelity causal representations compared to state-of-the-art baselines.
π Abstract
Modeling hierarchical latent dynamics behind time series data is critical for capturing temporal dependencies across multiple levels of abstraction in real-world tasks. However, existing temporal causal representation learning methods fail to capture such dynamics, as they fail to recover the joint distribution of hierarchical latent variables from extit{single-timestep observed variables}. Interestingly, we find that the joint distribution of hierarchical latent variables can be uniquely determined using three conditionally independent observations. Building on this insight, we propose a Causally Hierarchical Latent Dynamic (CHiLD) identification framework. Our approach first employs temporal contextual observed variables to identify the joint distribution of multi-layer latent variables. Sequentially, we exploit the natural sparsity of the hierarchical structure among latent variables to identify latent variables within each layer. Guided by the theoretical results, we develop a time series generative model grounded in variational inference. This model incorporates a contextual encoder to reconstruct multi-layer latent variables and normalize flow-based hierarchical prior networks to impose the independent noise condition of hierarchical latent dynamics. Empirical evaluations on both synthetic and real-world datasets validate our theoretical claims and demonstrate the effectiveness of CHiLD in modeling hierarchical latent dynamics.