🤖 AI Summary
This work addresses the challenge of simultaneously modeling global context and preserving local details in image understanding by proposing ConvNeur, a novel architecture that explicitly decouples global reasoning from local representation for the first time. ConvNeur employs a dual-branch design: a lightweight neural memory branch efficiently captures global context, while a local preservation branch leverages convolutions to retain fine-grained structural details. A learnable gating mechanism adaptively modulates local features using global information. Combined with a compact token aggregation strategy, the model achieves sub-quadratic computational complexity while maintaining local inductive biases. Extensive experiments demonstrate that ConvNeur outperforms existing methods across image classification, object detection, and semantic segmentation tasks, achieving superior accuracy-latency trade-offs at comparable or lower computational costs.
📝 Abstract
Modern vision models must capture image-level context without sacrificing local detail while remaining computationally affordable. We revisit this tradeoff and advance a simple principle: decouple the roles of global reasoning and local representation. To operationalize this principle, we introduce ConvNeur, a two-branch architecture in which a lightweight neural memory branch aggregates global context on a compact set of tokens, and a locality-preserving branch extracts fine structure. A learned gate lets global cues modulate local features without entangling their objectives. This separation yields subquadratic scaling with image size, retains inductive priors associated with local processing, and reduces overhead relative to fully global attention. On standard classification, detection, and segmentation benchmarks, ConvNeur matches or surpasses comparable alternatives at similar or lower compute and offers favorable accuracy versus latency trade-offs at similar budgets. These results support the view that efficiency follows global-local decoupling.