Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

๐Ÿ“… 2026-04-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

193K/year
๐Ÿค– AI Summary
This work identifies and formally names a previously unrecognized privacy vulnerability in mainstream machine learning frameworks, termed โ€œQuantamination,โ€ arising from the implementation of dynamic quantization. While dynamic quantization enhances inference efficiency, we demonstrate that it inadvertently introduces a novel side-channel attack surface, enabling cross-batch leakage of user inputs. Through a systematic combination of side-channel analysis, reverse engineering of quantization mechanisms, and auditing of framework configurations, we empirically evaluate multiple widely used ML inference engines. Our findings confirm that at least four frameworks, under default or common deployment settings, are susceptible to this vulnerability, allowing an adversary to partially or fully reconstruct sensitive data from other users within the same inference batch.
๐Ÿ“ Abstract
Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss in model accuracy. In this paper, we reveal a critical vulnerability in dynamic quantization: an adversary can exploit such quantization strategy to steal sensitive user data placed in the same batch as the adversary's input. Our analysis demonstrates that dynamic quantization, when improperly implemented or configured, can create side channels that expose information about other inputs within the same batch. We call this phenomenon Quantamination, describing contamination from quantization. Specifically, we show that at least 4 of the most popular ML frameworks in use today either default to or can use configurations that leak data across the batch boundary. This data leakage, in theory, allows attackers to partially or even fully recover other users' batched input data, representing a serious privacy risk for existing ML serving frameworks.
Problem

Research questions and friction points this paper is trying to address.

dynamic quantization
data leakage
batch privacy
side channel
model serving
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic quantization
side-channel attack
batch inference
model serving security
Quantamination
๐Ÿ”Ž Similar Papers