🤖 AI Summary
Efficient approximation of matrix functions—such as the matrix exponential and matrix sign function—remains challenging in scientific computing. This work proposes a general approximation framework leveraging ReLU deep neural networks combined with a Transformer encoder-decoder architecture. It establishes, for the first time, theoretical bounds on the required network width and depth to approximate the matrix exponential, and demonstrates that numerical encoding strategies critically influence Transformer performance. Theoretical analysis quantifies the relationship between approximation error and network complexity. Empirical results show that the proposed method achieves relative errors below 5% with high probability across a range of matrix functions, offering both rigorous theoretical guarantees and strong practical performance.
📝 Abstract
Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map scalar functions to matrices, using neural networks including transformers. We focus on functions mapping square matrices to square matrices of the same dimension. These types of matrix functions appear throughout scientific computing, e.g., the matrix exponential in continuous-time Markov chains and the matrix sign function in stability analysis of dynamical systems. In this paper, we make two contributions. First, we prove bounds on the width and depth of ReLU networks needed to approximate the matrix exponential to an arbitrary precision. Second, we show experimentally that a transformer encoder-decoder with suitable numerical encodings can approximate certain matrix functions at a relative error of 5% with high probability. Our study reveals that the encoding scheme strongly affects performance, with different schemes working better for different functions.