🤖 AI Summary
Computing conditional differential entropy for high-dimensional time series with unknown distributions is intractable, hindering reliable relative complexity ranking. To address this, we propose a surrogate upper bound on entropy based on the determinant of the prediction error covariance matrix. Leveraging the Hadamard inequality and the positive semi-definiteness of covariance matrices, we rigorously tighten the theoretical entropy upper bound originally proposed by Fang et al., substantially enhancing the theoretical soundness of complexity ordering. Our method is compatible with both linear regression and neural network predictors. Empirical evaluation on synthetic linear processes and biologically inspired audio data demonstrates accurate recovery of ground-truth complexity rankings. Experiments confirm the robustness and effectiveness of the surrogate metric under both known and unknown underlying distributions. This work establishes an interpretable, computationally feasible, model-free paradigm for comparing complexity in high-dimensional nonstationary time series.
📝 Abstract
Conditional differential entropy provides an intuitive measure for relatively ranking time-series complexity by quantifying uncertainty in future observations given past context. However, its direct computation for high-dimensional processes from unknown distributions is often intractable. This paper builds on the information theoretic prediction error bounds established by Fang et al. cite{fang2019generic}, which demonstrate that the conditional differential entropy extbf{$h(X_k mid X_{k-1},...,X_{k-m})$} is upper bounded by a function of the determinant of the covariance matrix of next-step prediction errors for any next step prediction model. We add to this theoretical framework by further increasing this bound by leveraging Hadamard's inequality and the positive semi-definite property of covariance matrices.
To see if these bounds can be used to rank the complexity of time series, we conducted two synthetic experiments: (1) controlled linear autoregressive processes with additive Gaussian noise, where we compare ordinary least squares prediction error entropy proxies to the true entropies of various additive noises, and (2) a complexity ranking task of bio-inspired synthetic audio data with unknown entropy, where neural network prediction errors are used to recover the known complexity ordering.
This framework provides a computationally tractable method for time-series complexity ranking using prediction errors from next-step prediction models, that maintains a theoretical foundation in information theory.