Trainable Bitwise Soft Quantization for Input Feature Compression

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the challenge of efficiently transmitting high-dimensional features from edge devices under stringent constraints on bandwidth, latency, and energy consumption. To this end, the authors propose a trainable, bit-wise soft quantization layer that approximates discrete step functions using multiple sigmoid functions, enabling end-to-end differentiability and task-oriented lossy compression. The method allows users to specify the desired bit-width and can be seamlessly integrated as a lightweight module at the data acquisition stage of neural networks, where it is jointly optimized with downstream tasks. Experimental results across multiple datasets demonstrate that the approach achieves 5–16× compression ratios (relative to 32-bit floating-point representations) using only 2–6 bits per feature, while maintaining accuracy nearly on par with full-precision models—significantly outperforming conventional quantization baselines.

Technology Category

Application Category

📝 Abstract

The growing demand for machine learning applications in the context of the Internet of Things calls for new approaches to optimize the use of limited compute and memory resources. Despite significant progress that has been made w.r.t. reducing model sizes and improving efficiency, many applications still require remote servers to provide the required resources. However, such approaches rely on transmitting data from edge devices to remote servers, which may not always be feasible due to bandwidth, latency, or energy constraints. We propose a task-specific, trainable feature quantization layer that compresses the input features of a neural network. This can significantly reduce the amount of data that needs to be transferred from the device to a remote server. In particular, the layer allows each input feature to be quantized to a user-defined number of bits, enabling a simple on-device compression at the time of data collection. The layer is designed to approximate step functions with sigmoids, enabling trainable quantization thresholds. By concatenating outputs from multiple sigmoids, introduced as bitwise soft quantization, it achieves trainable quantized values when integrated with a neural network. We compare our method to full-precision inference as well as to several quantization baselines. Experiments show that our approach outperforms standard quantization methods, while maintaining accuracy levels close to those of full-precision models. In particular, depending on the dataset, compression factors of $5\times$ to $16\times$ can be achieved compared to $32$-bit input without significant performance loss.

Problem

Research questions and friction points this paper is trying to address.

feature compression

input quantization

edge computing

bandwidth constraint

neural network

Innovation

Methods, ideas, or system contributions that make the work stand out.

trainable quantization

bitwise soft quantization

input feature compression