Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the excessive computational and memory overhead of Physics-Informed Neural Networks (PINNs) on edge devices—caused by high-order automatic differentiation, dense tensor operations, and full-precision arithmetic—this work proposes a triple-coordinated optimization framework. First, it introduces hybrid-precision training using the SMX numeric format. Second, it employs a Stein estimator-based differential quantization scheme to mitigate gradient underflow. Third, it designs a Tensor Train (TT) layer partial reconstruction strategy (PRS) to suppress quantization error. The framework integrates fully quantized training, Stein residual correction, and TT-based weight compression, and is implemented on a precision-scalable hardware accelerator, PINTA. Evaluated on the 2D Poisson, 20D Hamilton–Jacobi–Bellman (HJB), and 100D heat equations, the method matches or surpasses full-precision baselines in accuracy while achieving 5.5×–83.5× inference speedup and 159.6×–2324.1× energy reduction.

Technology Category

Application Category

📝 Abstract

Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

Problem

Research questions and friction points this paper is trying to address.

Enables scalable, energy-efficient PINN training on edge devices

Reduces computational and memory overhead via quantization and compression

Achieves high accuracy with significant speedups and energy savings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully quantized training with difference-based quantization for Stein's estimator

Tensor-train decomposition with partial-reconstruction scheme for weight compression

Mixed-precision training using square-block MX format to eliminate data duplication

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection