Almost-Free Queue Jumping for Prior Inputs in Private Neural Inference

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation caused by high-priority request preemption in privacy-preserving batched neural network inference, where cryptographic overheads typically hinder timely execution. To overcome this challenge, we propose PrivQJ, a novel framework that synergistically integrates homomorphic encryption and secure multi-party computation to enable high-priority inputs to “piggyback” on ongoing batch computations by reusing active computational slots. This mechanism achieves nearly zero additional cost for privacy-preserving queue preemption—without compromising security or disrupting other requests—thereby significantly accelerating the response time for urgent queries. Theoretical analysis and empirical evaluation demonstrate that PrivQJ reduces preemption overhead by over an order of magnitude compared to existing privacy-preserving machine learning as a service (PP-MLaaS) systems while maintaining high throughput.

Technology Category

Application Category

📝 Abstract
Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Naïve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.
Problem

Research questions and friction points this paper is trying to address.

Privacy-Preserving Machine Learning
Queue Jumping
Batched Inference
Prioritized Requests
Private Neural Inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving machine learning
queue jumping
slot recycling
homomorphic encryption
batched inference
🔎 Similar Papers
No similar papers found.
Qiao Zhang
Qiao Zhang
Shandong University
Privacy in Machine Learning
M
Minghui Xu
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
T
Tingchuang Zhang
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Xiuzhen Cheng
Xiuzhen Cheng
School of Computer Science and Technology, Shandong University
BlockchainIoT SecurityEdge ComputingDistributed Computing