🤖 AI Summary
This work addresses the challenges of low computational efficiency and poor global coordination in large-scale neural network training. The authors propose the Non-monotone Adaptive Parallel Trust-region Schwarz (NAPTS) method, which leverages domain decomposition to enable parallel subdomain optimization. By integrating a nonlinear additive Schwarz preconditioner with coarse-space correction and introducing a windowed non-monotone acceptance criterion, NAPTS permits controlled increases in the objective function, thereby preventing the rejection of otherwise beneficial coarse steps. Experimental results demonstrate that the method achieves comparable model accuracy while reducing CPU training time by 30% and decreasing the number of rejected iterations to one-third of those in the original APTS approach, significantly enhancing both parallel efficiency and robustness in large-scale training scenarios.
📝 Abstract
Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.