Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing substring searchable symmetric encryption (substring-SSE) schemes suffer from severe information leakage when adversaries possess partial plaintext knowledge—a threat scenario尚未 systematically studied for leakage-abuse attacks. This paper introduces the first leakage-abuse attack framework for substring-SSE under the partial-knowledge assumption. We innovatively extend LEAP into an iterative, matrix-based correlation analysis method, integrating ciphertext suffix-tree inversion, pattern-driven ciphertext-substring alignment, and statistical significance validation to reconstruct plaintext substrings from ciphertext tokens with high confidence. Experiments show substring recovery rates of 74.42% with only 10% auxiliary knowledge and 98.32% with 50%, demonstrating strong cross-dataset generalizability. Our work exposes a fundamental vulnerability of substring-SSE under realistic partial-knowledge threat models, providing both a critical security warning and a rigorous evaluation benchmark for future secure design.

Technology Category

Application Category

📝 Abstract
Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems. However, existing schemes remain vulnerable to information leakage during search operations, particularly when adversaries possess partial knowledge of the target dataset. Although leakage-abuse attacks have been widely studied for traditional SSE, their applicability to substring-SSE under partially known data assumptions remains unexplored. In this paper, we present the first leakage-abuse attack on substring-SSE under partially-known dataset conditions. We develop a novel matrix-based correlation technique that extends and optimizes the LEAP framework for substring-SSE, enabling efficient recovery of plaintext data from encrypted suffix tree structures. Unlike existing approaches that rely on independent auxiliary datasets, our method directly exploits known data fragments to establish high-confidence mappings between ciphertext tokens and plaintext substrings through iterative matrix transformations. Comprehensive experiments on real-world datasets demonstrate the effectiveness of the attack, with recovery rates reaching 98.32% for substrings given 50% auxiliary knowledge. Even with only 10% prior knowledge, the attack achieves 74.42% substring recovery while maintaining strong scalability across datasets of varying sizes. The result reveals significant privacy risks in current substring-SSE designs and highlights the urgent need for leakage-resilient constructions.
Problem

Research questions and friction points this paper is trying to address.

Attack substring-SSE with partial dataset knowledge
Develop matrix correlation for encrypted suffix trees
Achieve high substring recovery with limited auxiliary data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Matrix-based correlation technique optimizes LEAP framework
Uses known data fragments for ciphertext-plaintext mapping
Iterative matrix transformations enable efficient plaintext recovery
X
Xijie Ba
School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Q
Qin Liu
School of Cyber Science and Engineering, Wuhan University, Wuhan, China
X
Xiaohong Li
School of Computer Science, Wuhan University, Wuhan, China
Jianting Ning
Jianting Ning
Wuhan University