MogaNet: Multi-order Gated Aggregation Network

📅 2022-11-07
🏛️ International Conference on Learning Representations
📈 Citations: 63
Influential: 10
📄 PDF
🤖 AI Summary
Modern ConvNets suffer from representational bottlenecks due to fixed, small convolutional kernels that limit higher-order feature interactions. To address this, we propose MogaNet, a Multi-order Gated Aggregation Network. Its core innovation is a learnable gated aggregation module grounded in multi-order game-theoretic interaction—enabling dynamic, adaptive contextual feature fusion within a pure convolutional architecture. Additionally, MogaNet introduces lightweight multi-order convolutions and an efficient global context modeling mechanism. On ImageNet-1K, MogaNet achieves 80.0% top-1 accuracy with only 5.2M parameters and 87.8% with 181M parameters—outperforming ParC-Net and ConvNeXt-L by substantial margins, while reducing FLOPs by 59% and saving 17M parameters. Furthermore, MogaNet establishes new state-of-the-art performance across diverse downstream tasks, including object detection, instance segmentation, pose estimation, and video prediction.
📝 Abstract
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on extit{multi-order game-theoretic interaction} within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at url{https://github.com/Westlake-AI/MogaNet}.
Problem

Research questions and friction points this paper is trying to address.

Addresses representation bottleneck in modern ConvNets
Enhances multi-order interactions in deep neural networks
Improves complexity-performance trade-offs in visual representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-order gated aggregation for feature learning
Compact module with adaptive contextualization
Efficient convolutions with scalability and performance
🔎 Similar Papers
No similar papers found.
S
Siyuan Li
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
Zedong Wang
Zedong Wang
The Hong Kong University of Science and Technology (HKUST)
Deep LearningComputer VisionMulti-task Learning
Z
Zicheng Liu
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
C
Cheng Tan
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
H
Haitao Lin
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
D
Di Wu
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
Z
Zhiyuan Chen
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China; Zhejiang University, College of Computer Science and Technology, Hangzhou, China
Jiangbin Zheng
Jiangbin Zheng
Zhejiang University & Westlake University
AI for Life ScienceNatural Language ProcessingComputer VisionAI for Sign Language
S
Stan Z. Li
AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China