Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To mitigate timing side-channel attacks arising from global KV cache sharing in large language model (LLM) inference, this work proposes a selective cache-sharing mechanism that enhances inference efficiency while preserving input privacy. Methodologically, it integrates multi-level privacy detection—comprising rule-based matching, a general-purpose detector, and context-aware validation—with a unified radix-tree–based indexing structure and entropy-driven dynamic access monitoring, enabling fine-grained separation and real-time protection of sensitive versus non-sensitive cache entries. Experiments demonstrate that the approach mitigates 94%–97% of timing side-channel attacks; compared to full cache isolation, it reduces first-token latency by 40.58% and improves throughput by 2.66×. The core contribution lies in the first co-design of privacy-aware cache management and efficient KV cache sharing—achieving a principled balance between security guarantees and system performance.

Technology Category

Application Category

📝 Abstract
Global KV-cache sharing has emerged as a key optimization for accelerating large language model (LLM) inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolation, eliminate leakage but degrade performance by up to 38.9% in time-to-first-token (TTFT), making them impractical for high-throughput deployment. To address this gap, we introduce SafeKV (Secure and Flexible KV Cache Sharing), a privacy-aware KV-cache management framework that selectively shares non-sensitive entries while confining sensitive content to private caches. SafeKV comprises three components: (i) a hybrid, multi-tier detection pipeline that integrates rule-based pattern matching, a general-purpose privacy detector, and context-aware validation; (ii) a unified radix-tree index that manages public and private entries across heterogeneous memory tiers (HBM, DRAM, SSD); and (iii) entropy-based access monitoring to detect and mitigate residual information leakage. Our evaluation shows that SafeKV mitigates 94% - 97% of timing-based side-channel attacks. Compared to per-user isolation method, SafeKV improves TTFT by up to 40.58% and throughput by up to 2.66X across diverse LLMs and workloads. SafeKV reduces cache-induced TTFT overhead from 50.41% to 11.74% on Qwen3-235B. By combining fine-grained privacy control with high cache reuse efficiency, SafeKV reclaims the performance advantages of global sharing while providing robust runtime privacy guarantees for LLM inference.
Problem

Research questions and friction points this paper is trying to address.

Mitigate timing side-channel attacks in LLM inference
Balance KV-cache sharing between privacy and performance
Detect and prevent sensitive data leakage in cache
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective KV-cache sharing for privacy control
Hybrid multi-tier detection pipeline for sensitive content
Unified radix-tree index managing public and private entries