Approximating Matrix Functions with Deep Neural Networks and Transformers

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient approximation of matrix functions—such as the matrix exponential and matrix sign function—remains challenging in scientific computing. This work proposes a general approximation framework leveraging ReLU deep neural networks combined with a Transformer encoder-decoder architecture. It establishes, for the first time, theoretical bounds on the required network width and depth to approximate the matrix exponential, and demonstrates that numerical encoding strategies critically influence Transformer performance. Theoretical analysis quantifies the relationship between approximation error and network complexity. Empirical results show that the proposed method achieves relative errors below 5% with high probability across a range of matrix functions, offering both rigorous theoretical guarantees and strong practical performance.

Technology Category

Application Category

📝 Abstract
Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map scalar functions to matrices, using neural networks including transformers. We focus on functions mapping square matrices to square matrices of the same dimension. These types of matrix functions appear throughout scientific computing, e.g., the matrix exponential in continuous-time Markov chains and the matrix sign function in stability analysis of dynamical systems. In this paper, we make two contributions. First, we prove bounds on the width and depth of ReLU networks needed to approximate the matrix exponential to an arbitrary precision. Second, we show experimentally that a transformer encoder-decoder with suitable numerical encodings can approximate certain matrix functions at a relative error of 5% with high probability. Our study reveals that the encoding scheme strongly affects performance, with different schemes working better for different functions.
Problem

Research questions and friction points this paper is trying to address.

matrix functions
neural networks
transformers
scientific computing
numerical approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

matrix functions
neural network approximation
transformer architecture
ReLU network bounds
numerical encoding
🔎 Similar Papers
No similar papers found.
R
Rahul Padmanabhan
Department of Mathematics and Statistics, Concordia University, 1455 Blvd. De Maisonneuve Ouest, Montreal, Quebec H3G 1M8, Canada
Simone Brugiapaglia
Simone Brugiapaglia
Associate Professor, Concordia University, Department of Mathematics and Statistics
Numerical AnalysisMathematics of Data ScienceMachine LearningComputational Mathematics