🤖 AI Summary
Preconditioner design for scientific computing faces challenges including complex hyperparameter tuning, poor mathematical interpretability of deep learning approaches, and high training costs. To address these, this paper proposes the first end-to-end preconditioner learning framework based on actor-critic reinforcement learning. We formulate preconditioning as a context-dependent bandit decision process and introduce a dual-objective reward mechanism—combining critic-based evaluation with explicit condition-number constraints. To enhance training stability and generalization, we design a dynamic sparsity mask and cosine learning-rate scheduling. The actor network is parameterized via incomplete Cholesky decomposition factors. Evaluated on linear systems arising from diverse PDE discretizations, our method significantly accelerates iterative solvers. Compared to conventional and state-of-the-art neural preconditioners, it achieves lower training overhead, superior robustness, and stronger cross-problem generalization.
📝 Abstract
We present PEARL (Preconditioner Enhancement through Actor-critic Reinforcement Learning), a novel approach to learning matrix preconditioners. Existing preconditioners such as Jacobi, Incomplete LU, and Algebraic Multigrid methods offer problem-specific advantages but rely heavily on hyperparameter tuning. Recent advances have explored using deep neural networks to learn preconditioners, though challenges such as misbehaved objective functions and costly training procedures remain. PEARL introduces a reinforcement learning approach for learning preconditioners, specifically, a contextual bandit formulation. The framework utilizes an actor-critic model, where the actor generates the incomplete Cholesky decomposition of preconditioners, and the critic evaluates them based on reward-specific feedback. To further guide the training, we design a dual-objective function, combining updates from the critic and condition number. PEARL contributes a generalizable preconditioner learning method, dynamic sparsity exploration, and cosine schedulers for improved stability and exploratory power. We compare our approach to traditional and neural preconditioners, demonstrating improved flexibility and iterative solving speed.