🤖 AI Summary
To address the excessive computational and memory overhead of Physics-Informed Neural Networks (PINNs) on edge devices—caused by high-order automatic differentiation, dense tensor operations, and full-precision arithmetic—this work proposes a triple-coordinated optimization framework. First, it introduces hybrid-precision training using the SMX numeric format. Second, it employs a Stein estimator-based differential quantization scheme to mitigate gradient underflow. Third, it designs a Tensor Train (TT) layer partial reconstruction strategy (PRS) to suppress quantization error. The framework integrates fully quantized training, Stein residual correction, and TT-based weight compression, and is implemented on a precision-scalable hardware accelerator, PINTA. Evaluated on the 2D Poisson, 20D Hamilton–Jacobi–Bellman (HJB), and 100D heat equations, the method matches or surpasses full-precision baselines in accuracy while achieving 5.5×–83.5× inference speedup and 159.6×–2324.1× energy reduction.
📝 Abstract
Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.