Hierarchical Transformer Preconditioning for Interactive Physics Simulation

πŸ“… 2026-05-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

210K/year
πŸ€– AI Summary
Existing neural preconditioners are limited by local message passing or sparse access patterns, hindering efficient modeling of long-range couplings in physical simulations. This work proposes a hierarchical Transformer preconditioner based on weakly admissible H-matrix partitioning, incorporating multiscale structural priors to achieve an O(N)-complexity approximate inverse over the full graph. Context is propagated across layers via axial buffering and global summary tokens, while a cosine-Hutchinson probe objective optimizes angular alignment between MAz and zβ€”relaxing strict eigenvalue clustering and eliminating redundant spectral constraints to enhance convergence under irregular spectra. Combined with low-rank far-field factors, highway connections, dense GEMM operations, and CUDA Graph acceleration, the method achieves 17.9 ms/frame on a high-contrast multiphase Poisson system (N=8,192), outperforming GPU Jacobi, AMGX IC/DILU, and neural SPAI by 2.2Γ—, 28Γ—, and 2.7Γ—, respectively.
πŸ“ Abstract
Neural preconditioners for real-time physics simulation offer promising data-driven priors, but they often fail to capture long-range couplings efficiently because they inherit local message passing or sparse-operator access patterns. We introduce the Hierarchical Transformer Preconditioner, a neural preconditioner anchored to a weak-admissibility H-matrix partition. The partition provides a multiscale structural prior (dense diagonal leaves plus coarsening off-diagonal tiles) that enables full-graph approximate-inverse computation with O(N) scaling at fixed block sizes. The network models the inverse through low-rank far-field factors and uses highway connections (axial buffers plus a global summary token) to propagate context across transformer depth. At each PCG iteration, preconditioner application reduces to batched dense GEMMs with regular memory access. The key training contribution is a cosine-Hutchinson probe objective that learns the action of MA on convergence-critical spectral subspaces, optimizing angular alignment of MAz with z rather than forcing eigenvalue clusters to a prescribed location. This removes unnecessary spectral-placement constraints from SAI-style objectives and improves conditioning on irregular spectra. Because both inference and apply are dense, dependency-free tensor programs, the full solve loop is captured as a single CUDA Graph. On stiff multiphase Poisson systems (up to 100:1 density contrast, N = 1,024-16,384), the solver runs from ~143 to ~21 fps. At N = 8,192, it reaches 17.9 ms/frame, with 2.2x speedup over GPU Jacobi, ~28x over GPU IC/DILU (AMGX multicolor_dilu), and 2.7x over neural SPAI retrained per scale on the same benchmark.
Problem

Research questions and friction points this paper is trying to address.

neural preconditioning
long-range couplings
real-time physics simulation
irregular spectra
multiphase Poisson systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Transformer
Neural Preconditioning
H-matrix
Cosine-Hutchinson Probe
CUDA Graph