Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging trade-off among accuracy, latency, and energy consumption in high-resolution image segmentation on edge devices, where existing binary neural networks often suffer from significant accuracy degradation and lack efficient GPU support. The authors propose Masked Binary U-Net (MBU-Net), which for the first time identifies the critical role of zero states in binary U-Net performance. By introducing a unified inter-layer quantization sensitivity assumption and integrating a cost-aware masking strategy with subtractive bit encoding, MBU-Net achieves near full-precision accuracy while approaching the computational efficiency of binary networks. Furthermore, a dedicated GPU execution framework is designed to efficiently map the model onto Tensor Cores. Evaluated on three segmentation benchmarks, MBU-Net incurs only a 3% average mIoU drop compared to FP16 U-Net, while delivering a 2.04× speedup and 3.54× energy reduction.

Technology Category

Application Category

📝 Abstract
Real-time image segmentation is a key enabler for AR/VR, robotics, drones, and autonomous systems, where tight accuracy, latency, and energy budgets must be met on resource-constrained edge devices. While U-Net offers a favorable balance of accuracy and efficiency compared to large transformer-based models, achieving real-time performance on high-resolution input remains challenging due to compute, memory, and power limits. Extreme quantization, particularly binary networks, is appealing for its hardware-friendly operations. However, two obstacles limit practicality: (1) severe accuracy degradation, and (2) a lack of end-to-end implementations that deliver efficiency on general-purpose GPUs. We make two empirical observations that guide our design. (1) An explicit zero state is essential: training with zero masking to binary U-Net weights yields noticeable sparsity. (2) Quantization sensitivity is uniform across layers. Motivated by these findings, we introduce Masked Binary U-Net (MBU-Net), obtained through a cost-aware masking strategy that prioritizes masking where it yields the highest accuracy-per-cost, reconciling accuracy with near-binary efficiency. To realize these gains in practice, we develop a GPU execution framework that maps MBU-Net to Tensor Cores via a subtractive bit-encoding scheme, efficiently implementing masked binary weights with binary activations. This design leverages native binary Tensor Core BMMA instructions, enabling high throughput and energy savings on widely available GPUs. Across 3 segmentation benchmarks, MBU-Net attains near full-precision accuracy (3% average drop) while delivering 2.04x speedup and 3.54x energy reductions over a 16-bit floating point U-Net.
Problem

Research questions and friction points this paper is trying to address.

image segmentation
binary neural networks
real-time performance
edge computing
quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked Binary U-Net
Tensor Cores
binary neural networks
sparsity
energy-efficient inference
🔎 Similar Papers
No similar papers found.
C
Chunshu Wu
PNNL
R
Ruibing Song
Rice University
S
S. Kondguli
META
T
Tong Geng
Rice University
Ang Li
Ang Li
Pacific Northwest National Laboratory and University of Washington
GPUHigh Performance ComputingQuantum ComputingComputer Architecture