🤖 AI Summary
Existing substring searchable symmetric encryption (substring-SSE) schemes suffer from severe information leakage when adversaries possess partial plaintext knowledge—a threat scenario尚未 systematically studied for leakage-abuse attacks. This paper introduces the first leakage-abuse attack framework for substring-SSE under the partial-knowledge assumption. We innovatively extend LEAP into an iterative, matrix-based correlation analysis method, integrating ciphertext suffix-tree inversion, pattern-driven ciphertext-substring alignment, and statistical significance validation to reconstruct plaintext substrings from ciphertext tokens with high confidence. Experiments show substring recovery rates of 74.42% with only 10% auxiliary knowledge and 98.32% with 50%, demonstrating strong cross-dataset generalizability. Our work exposes a fundamental vulnerability of substring-SSE under realistic partial-knowledge threat models, providing both a critical security warning and a rigorous evaluation benchmark for future secure design.
📝 Abstract
Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems. However, existing schemes remain vulnerable to information leakage during search operations, particularly when adversaries possess partial knowledge of the target dataset. Although leakage-abuse attacks have been widely studied for traditional SSE, their applicability to substring-SSE under partially known data assumptions remains unexplored. In this paper, we present the first leakage-abuse attack on substring-SSE under partially-known dataset conditions. We develop a novel matrix-based correlation technique that extends and optimizes the LEAP framework for substring-SSE, enabling efficient recovery of plaintext data from encrypted suffix tree structures. Unlike existing approaches that rely on independent auxiliary datasets, our method directly exploits known data fragments to establish high-confidence mappings between ciphertext tokens and plaintext substrings through iterative matrix transformations. Comprehensive experiments on real-world datasets demonstrate the effectiveness of the attack, with recovery rates reaching 98.32% for substrings given 50% auxiliary knowledge. Even with only 10% prior knowledge, the attack achieves 74.42% substring recovery while maintaining strong scalability across datasets of varying sizes. The result reveals significant privacy risks in current substring-SSE designs and highlights the urgent need for leakage-resilient constructions.