A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the challenges of low computational efficiency and poor global coordination in large-scale neural network training. The authors propose the Non-monotone Adaptive Parallel Trust-region Schwarz (NAPTS) method, which leverages domain decomposition to enable parallel subdomain optimization. By integrating a nonlinear additive Schwarz preconditioner with coarse-space correction and introducing a windowed non-monotone acceptance criterion, NAPTS permits controlled increases in the objective function, thereby preventing the rejection of otherwise beneficial coarse steps. Experimental results demonstrate that the method achieves comparable model accuracy while reducing CPU training time by 30% and decreasing the number of rejected iterations to one-third of those in the original APTS approach, significantly enhancing both parallel efficiency and robustness in large-scale training scenarios.
📝 Abstract
Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.
Problem

Research questions and friction points this paper is trying to address.

neural network training
domain decomposition
trust-region method
non-monotone optimization
parallel computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

non-monotone trust-region
additive Schwarz preconditioner
domain decomposition
coarse-space correction
parallel neural network training
🔎 Similar Papers
No similar papers found.