FlashMP: Fast Discrete Transform-Based Solver for Preconditioning Maxwell's Equations on GPUs

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large-scale electromagnetic simulations using the curl-curl-based conformal finite-difference time-domain (CN-FDTD) method, iterative solvers suffer from slow convergence due to the ill-conditioning induced by the double-curl operator, while existing preconditioners are insufficient and direct solvers incur prohibitive memory overhead. To address this, we propose an efficient domain-decomposition preconditioner based on discrete transforms. Our approach innovatively constructs exact subdomain solvers and integrates them within a multi-GPU scalable architecture, enabling highly parallel preconditioning with low memory footprint. Evaluated on an AMD MI60 cluster, the method reduces iteration counts to 1/16 of the baseline, achieves 2.5–4.9× speedup over state-of-the-art libraries (e.g., Hypre), and attains an 84.1% weak scaling efficiency. These results demonstrate substantial improvements in both computational efficiency and scalability for large-scale electromagnetic simulations.

Technology Category

Application Category

📝 Abstract
Efficiently solving large-scale linear systems is a critical challenge in electromagnetic simulations, particularly when using the Crank-Nicolson Finite-Difference Time-Domain (CN-FDTD) method. Existing iterative solvers are commonly employed to handle the resulting sparse systems but suffer from slow convergence due to the ill-conditioned nature of the double-curl operator. Approximate preconditioners, like Successive Over-Relaxation (SOR) and Incomplete LU decomposition (ILU), provide insufficient convergence, while direct solvers are impractical due to excessive memory requirements. To address this, we propose FlashMP, a novel preconditioning system that designs a subdomain exact solver based on discrete transforms. FlashMP provides an efficient GPU implementation that achieves multi-GPU scalability through domain decomposition. Evaluations on AMD MI60 GPU clusters (up to 1000 GPUs) show that FlashMP reduces iteration counts by up to 16x and achieves speedups of 2.5x to 4.9x compared to baseline implementations in state-of-the-art libraries such as Hypre. Weak scalability tests show parallel efficiencies up to 84.1%.
Problem

Research questions and friction points this paper is trying to address.

Solving large-scale linear systems in electromagnetic simulations efficiently
Addressing slow convergence of iterative solvers for ill-conditioned double-curl operator
Overcoming memory limitations of direct solvers with GPU-optimized preconditioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete transform-based subdomain exact solver
Efficient GPU implementation for scalability
Domain decomposition enables multi-GPU performance
🔎 Similar Papers
No similar papers found.
H
Haoyuan Zhang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Y
Yaqian Gao
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Xinxin Zhang
Xinxin Zhang
Department of Electrical Engineering, Technical University of Denmark
Functional ModellingArtificial IntelligenceAlarm Design
J
Jialin Li
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
R
Runfeng Jin
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Y
Yidong Chen
Tsinghua University, Beijing, China
F
Feng Zhang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
W
Wu Yuan
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
W
Wenpeng Ma
Xinyang Normal University, Xinyang, China
S
Shan Liang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
J
Jian Zhang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Z
Zhonghua Lu
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China