π€ AI Summary
This work addresses the well-known difficulty of neural networks in approximating high-frequency functions, where conventional residual connections often fail to effectively capture high-frequency patterns. To overcome this limitation, the authors propose a gradient-enhanced residual connection mechanism that explicitly incorporates input gradients into the skip path for the first time. By forming a learnable convex combination of standard residuals and gradient-based residuals, the method adaptively modulates the networkβs reliance on high-frequency information. Theoretically, this design enhances sensitivity to input variations. Empirically, the approach significantly outperforms standard residual networks on high-frequency sinusoidal regression tasks and demonstrates consistent gains in single-image super-resolution, while maintaining competitive performance on standard vision benchmarks such as image classification and segmentation.
π Abstract
Existing work has linked properties of a function's gradient to the difficulty of function approximation. Motivated by these insights, we study how gradient information can be leveraged to improve neural network's ability to approximate high-frequency functions, and we propose a gradient-based residual connection as a complement to the standard identity skip connection used in residual networks. We provide simple theoretical intuition for why gradient information can help distinguish inputs and improve the approximation of functions with rapidly varying behaviour. On a synthetic regression task with a high-frequency sinusoidal ground truth, we show that conventional residual connections struggle to capture high-frequency patterns. In contrast, our gradient residual substantially improves approximation quality. We then introduce a convex combination of the standard and gradient residuals, allowing the network to flexibly control how strongly it relies on gradient information. After validating the design choices of our proposed method through an ablation study, we further validate our approach's utility on the single-image super-resolution task, where the underlying function may be high-frequency. Finally, on standard tasks such as image classification and segmentation, our method achieves performance comparable to standard residual networks, suggesting its broad utility.