๐ค AI Summary
This work addresses the substantial bias and poor generalizability of transfer entropy (TE) estimation under stationary processes. We propose TREET, a Transformer-based neural estimator that innovatively integrates the DonskerโVaradhan dual representation with self-attention mechanisms, establishing the first differentiable TE optimization framework grounded in the functional representation lemma. This unified framework supports TE estimation, channel capacity computation, and density inversion. By circumventing the high sensitivity of conventional nonparametric estimators to high-dimensional, small-sample settings, TREET achieves significant performance gains over state-of-the-art TE estimators on standard benchmarks. It is the first method to successfully estimate channel capacity for memory channels and demonstrates robust causal inference and joint density modeling on real-world physiological data from sleep apnea patients.
๐ Abstract
Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes, providing valuable insights for a wide range of real-world applications. This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes. The proposed approach employs Donsker-Varadhan representation to TE and leverages the attention mechanism for the task of neural estimation. We propose a detailed theoretical and empirical study of the TREET, comparing it to existing methods on a dedicated estimation benchmark. To increase its applicability, we design an estimated TE optimization scheme that is motivated by the functional representation lemma, and use it to estimate the capacity of communication channels with memory, which is a canonical optimization problem in information theory. We further demonstrate how an optimized TREET can be used to estimate underlying densities, providing experimental results. Finally, we apply TREET to feature analysis of patients with Apnea, demonstrating its applicability to real-world physiological data. Our work, applied with state-of-the-art deep learning methods, opens a new door for communication problems which are yet to be solved.