π€ AI Summary
This study addresses the convergence of training two-layer ReLU neural networks under Gaussian random masking of inputsβa setting relevant to sensor noise, missing data, and privacy-preserving mechanisms. By leveraging neural tangent kernel (NTK) theory and providing a refined characterization of the intrinsic randomness induced by the ReLU activation under masked inputs, the work establishes the first linear convergence guarantee in this context: gradient descent converges linearly to a neighborhood of the global optimum, with the radius of this neighborhood proportional to the variance of the masking noise. This result overcomes a key technical challenge in jointly handling nonlinear activations and input randomness, offering a rigorous theoretical foundation for training neural networks with noisy or partially observed inputs.
π Abstract
We investigate the convergence guarantee of two-layer neural network training with Gaussian randomly masked inputs. This scenario corresponds to Gaussian dropout at the input level, or noisy input training common in sensor networks, privacy-preserving training, and federated learning, where each user may have access to partial or corrupted features. Using a Neural Tangent Kernel (NTK) analysis, we demonstrate that training a two-layer ReLU network with Gaussian randomly masked inputs achieves linear convergence up to an error region proportional to the mask's variance. A key technical contribution is resolving the randomness within the non-linear activation, a problem of independent interest.