🤖 AI Summary
Transfer entropy (TE) suffers from positive bias and the absence of rigorous statistical significance testing under finite-sample conditions—particularly problematic in sparse-count scenarios, where spurious causal inferences frequently arise. To address these issues, we propose a corrected nonparametric TE estimation framework: first, we explicitly model finite-sample bias via information-content analysis and refine the plug-in estimator to achieve asymptotic unbiasedness even in small-sample, high-cardinality settings; second, we develop a fully nonparametric significance test that requires no simulation-based null distribution estimation. This work is the first to jointly guarantee asymptotic unbiasedness and statistically valid inference for TE estimation—both theoretically grounded and empirically validated. Experiments on synthetic and real-world datasets demonstrate substantial improvements in robustness for directional causal identification and effective suppression of sparsity-induced false positives. The proposed method thus provides a reliable, interpretable tool for directed information flow analysis in complex systems.
📝 Abstract
Transfer entropy is a widely used measure for quantifying directed information flows in complex systems. While the challenges of estimating transfer entropy for continuous data are well known, it has two major shortcomings that persist even for data of finite cardinality: it exhibits a substantial positive bias for sparse bin counts, and it has no clear means to assess statistical significance. By more precisely accounting for information content in finite data streams, we derive a transfer entropy measure which is asymptotically equivalent to the standard plug-in estimator but remedies these issues for time series of small size and/or high cardinality, permitting a fully nonparametric assessment of statistical significance without simulation. We show that this correction for finite data has a substantial impact on results in both real and synthetic time series datasets.