CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of existing vision backbones on low-parallelism hardware such as CPUs, which are typically optimized for highly parallel accelerators. The authors propose design principles tailored for CPU deployment, emphasizing a balance between high multiply-accumulate operations per second (MACpS) and low latency, and introduce CPUBone—the first family of vision backbones explicitly optimized for CPUs. By incorporating grouped convolutions and small kernel sizes, CPUBone reduces computational load while enhancing execution efficiency on CPU hardware. Experiments demonstrate that CPUBone achieves state-of-the-art accuracy–speed trade-offs across diverse CPU platforms and exhibits strong transfer performance on downstream tasks including object detection and semantic segmentation.
📝 Abstract
Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile phones and embedded AI accelerator modules. In contrast, CPUs do not have the possibility to parallelize operations in the same manner, wherefore models benefit from a specific design philosophy that balances amount of operations (MACs) and hardware-efficient execution by having high MACs per second (MACpS). In pursuit of this, we investigate two modifications to standard convolutions, aimed at reducing computational cost: grouping convolutions and reducing kernel sizes. While both adaptations substantially decrease the total number of MACs required for inference, sustaining low latency necessitates preserving hardware-efficiency. Our experiments across diverse CPU devices confirm that these adaptations successfully retain high hardware-efficiency on CPUs. Based on these insights, we introduce CPUBone, a new family of vision backbone models optimized for CPU-based inference. CPUBone achieves state-of-the-art Speed-Accuracy Trade-offs (SATs) across a wide range of CPU devices and effectively transfers its efficiency to downstream tasks such as object detection and semantic segmentation. Models and code are available at https://github.com/altair199797/CPUBone.
Problem

Research questions and friction points this paper is trying to address.

vision backbone
CPU inference
hardware efficiency
low parallelization
model optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

CPU-efficient design
grouped convolutions
reduced kernel size
hardware-aware architecture
vision backbone