Efficiency Follows Global-Local Decoupling

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of simultaneously modeling global context and preserving local details in image understanding by proposing ConvNeur, a novel architecture that explicitly decouples global reasoning from local representation for the first time. ConvNeur employs a dual-branch design: a lightweight neural memory branch efficiently captures global context, while a local preservation branch leverages convolutions to retain fine-grained structural details. A learnable gating mechanism adaptively modulates local features using global information. Combined with a compact token aggregation strategy, the model achieves sub-quadratic computational complexity while maintaining local inductive biases. Extensive experiments demonstrate that ConvNeur outperforms existing methods across image classification, object detection, and semantic segmentation tasks, achieving superior accuracy-latency trade-offs at comparable or lower computational costs.

Technology Category

Application Category

📝 Abstract
Modern vision models must capture image-level context without sacrificing local detail while remaining computationally affordable. We revisit this tradeoff and advance a simple principle: decouple the roles of global reasoning and local representation. To operationalize this principle, we introduce ConvNeur, a two-branch architecture in which a lightweight neural memory branch aggregates global context on a compact set of tokens, and a locality-preserving branch extracts fine structure. A learned gate lets global cues modulate local features without entangling their objectives. This separation yields subquadratic scaling with image size, retains inductive priors associated with local processing, and reduces overhead relative to fully global attention. On standard classification, detection, and segmentation benchmarks, ConvNeur matches or surpasses comparable alternatives at similar or lower compute and offers favorable accuracy versus latency trade-offs at similar budgets. These results support the view that efficiency follows global-local decoupling.
Problem

Research questions and friction points this paper is trying to address.

global-local decoupling
computational efficiency
vision models
global context
local detail
Innovation

Methods, ideas, or system contributions that make the work stand out.

global-local decoupling
ConvNeur
neural memory
subquadratic scaling
learned gating
🔎 Similar Papers
No similar papers found.
Z
Zhenyu Yang
Nanjing University of Science and Technology; State Key Laboratory of Intelligent Manufacturing of Advanced Construction Machinery
G
Gensheng Pei
Department of Electrical and Computer Engineering, Sungkyunkwan University
Tao Chen
Tao Chen
Nanjing University of Science and Technology
computer vision
Yichao Zhou
Yichao Zhou
UC Berkeley
3D VisionComputer GraphicsMachine Learning
Tianfei Zhou
Tianfei Zhou
Beijing Institute of Technology | ETH Zurich
Artificial IntelligenceMedical AIComputer Vision
Y
Yazhou Yao
Nanjing University of Science and Technology; State Key Laboratory of Intelligent Manufacturing of Advanced Construction Machinery
F
Fumin Shen
University of Electronic Science and Technology of China