Towards Better Generalization via Distributional Input Projection Network

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the decoupling between training loss and generalization in over-parameterized neural networks, this paper proposes the Distributed Input Projection Network (DIPNet): a plug-and-play, architecture-agnostic module that maps inputs to learnable probability distributions at each layer, thereby smoothing the loss landscape via hierarchical input distribution projection. Theoretically, DIPNet reduces local sharpness and imposes tight bounds on the network’s Lipschitz constant. It seamlessly integrates with mainstream architectures—including ViT, LLMs, ResNet, and MLP—without structural modification. Extensive experiments demonstrate consistent and significant improvements in standard accuracy, adversarial robustness, out-of-distribution generalization, and inference tasks, validating its robustness and broad applicability. The core innovation lies in the first systematic incorporation of hierarchical input distribution modeling for generalization enhancement, uniquely combining rigorous theoretical guarantees with practical deployment feasibility.

Technology Category

Application Category

📝 Abstract

As overparameterized models become increasingly prevalent, training loss alone offers limited insight into generalization performance. While smoothness has been linked to improved generalization across various settings, directly enforcing smoothness in neural networks remains challenging. To address this, we introduce Distributional Input Projection Networks (DIPNet), a novel framework that projects inputs into learnable distributions at each layer. This distributional representation induces a smoother loss landscape with respect to the input, promoting better generalization. We provide theoretical analysis showing that DIPNet reduces both local smoothness measures and the Lipschitz constant of the network, contributing to improved generalization performance. Empirically, we validate DIPNet across a wide range of architectures and tasks, including Vision Transformers (ViTs), Large Language Models (LLMs), ResNet and MLPs. Our method consistently enhances test performance under standard settings, adversarial attacks, out-of-distribution inputs, and reasoning benchmarks. We demonstrate that the proposed input projection strategy can be seamlessly integrated into existing models, providing a general and effective approach for boosting generalization performance in modern deep learning.

Problem

Research questions and friction points this paper is trying to address.

Improving generalization in overparameterized models via smoothness

Enforcing smoothness in neural networks remains challenging directly

Introducing DIPNet to project inputs into learnable distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Projects inputs into learnable distributions

Reduces local smoothness and Lipschitz constant

Seamlessly integrates into existing models

🔎 Similar Papers

No similar papers found.