🤖 AI Summary
This work addresses the poor out-of-distribution (OOD) generalization, lack of theoretical guarantees, and limited interpretability of recurrent neural networks (RNNs) on temporal data. By modeling the post-training RNN state dynamics as a nonlinear closed-loop system, the authors introduce Koopman operator theory—applied here for the first time to RNNs—to approximate this system with a linear representation. Combining this linearization with spectral analysis, they rigorously quantify the worst-case impact of domain shift on generalization error. Based on this analysis, they derive a generalization error bound for non-i.i.d. temporal data and propose an interpretable, robust domain generalization training method. Experiments across multiple temporal tasks demonstrate that the proposed approach significantly reduces OOD generalization error and enhances model robustness to domain shifts.
📝 Abstract
Deep learning (DL) has driven broad advances across scientific and engineering domains. Despite its success, DL models often exhibit limited interpretability and generalization, which can undermine trust, especially in safety-critical deployments. As a result, there is growing interest in (i) analyzing interpretability and generalization and (ii) developing models that perform robustly under data distributions different from those seen during training (i.e. domain generalization). However, the theoretical analysis of DL remains incomplete. For example, many generalization analyses assume independent samples, which is violated in sequential data with temporal correlations. Motivated by these limitations, this paper proposes a method to analyze interpretability and out-of-domain (OOD) generalization for a family of recurrent neural networks (RNNs). Specifically, the evolution of a trained RNN's states is modeled as an unknown, discrete-time, nonlinear closed-loop feedback system. Using Koopman operator theory, these nonlinear dynamics are approximated with a linear operator, enabling interpretability. Spectral analysis is then used to quantify the worst-case impact of domain shifts on the generalization error. Building on this analysis, a domain generalization method is proposed that reduces the OOD generalization error and improves the robustness to distribution shifts. Finally, the proposed analysis and domain generalization approach are validated on practical temporal pattern-learning tasks.