Rethinking the shape convention of an MLP

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Traditional MLPs with “narrow-wide-narrow” architectures limit the expressive capacity of skip connections. To address this, we propose an Hourglass MLP module featuring a “wide-narrow-wide” structure: skip connections operate in a high-dimensional expanded space, while the residual path traverses a low-dimensional bottleneck for computational efficiency. Our key contribution is the first introduction—and theoretical and empirical validation—of skip connections in high-dimensional space, supported by randomly initialized, fixed projection layers that ensure stable training and inference. Integrated with systematic architecture search, this design substantially improves the performance-parameter Pareto frontier. On mainstream image generation benchmarks, Hourglass MLP consistently outperforms conventional architectures; notably, its advantage grows with parameter count, demonstrating the scalability of deep, wide-domain skip connections.

Technology Category

Application Category

📝 Abstract

Multi-layer perceptrons (MLPs) conventionally follow a narrow-wide-narrow design where skip connections operate at the input/output dimensions while processing occurs in expanded hidden spaces. We challenge this convention by proposing wide-narrow-wide (Hourglass) MLP blocks where skip connections operate at expanded dimensions while residual computation flows through narrow bottlenecks. This inversion leverages higher-dimensional spaces for incremental refinement while maintaining computational efficiency through parameter-matched designs. Implementing Hourglass MLPs requires an initial projection to lift input signals to expanded dimensions. We propose that this projection can remain fixed at random initialization throughout training, enabling efficient training and inference implementations. We evaluate both architectures on generative tasks over popular image datasets, characterizing performance-parameter Pareto frontiers through systematic architectural search. Results show that Hourglass architectures consistently achieve superior Pareto frontiers compared to conventional designs. As parameter budgets increase, optimal Hourglass configurations favor deeper networks with wider skip connections and narrower bottlenecks-a scaling pattern distinct from conventional MLPs. Our findings suggest reconsidering skip connection placement in modern architectures, with potential applications extending to Transformers and other residual networks.

Problem

Research questions and friction points this paper is trying to address.

Challenging conventional narrow-wide-narrow MLP design patterns

Proposing wide-narrow-wide Hourglass MLPs with inverted skip connections

Investigating performance-parameter tradeoffs through systematic architectural evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hourglass MLP blocks invert skip connection placement

Random fixed projection enables efficient training and inference

Wide skip connections with narrow bottlenecks improve scaling

🔎 Similar Papers

No similar papers found.