Information Leakage Detection through Approximate Bayes-optimal Prediction

📅 2024-01-25

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Information leakage (IL) detection faces challenges including difficulty in estimating high-dimensional mutual information (MI), poor convergence, and limitations of conventional methods to binary sensitive attributes. This paper proposes the first general-purpose IL detection framework for arbitrary sensitive information, deeply integrating statistical learning theory with information theory. Instead of explicit MI estimation—prone to bias and instability—we employ the log-loss and classification accuracy of a Bayes-optimal predictor as principled surrogates for MI. Our method synergistically combines AutoML, Bayesian modeling, log-loss optimization, and information-theoretic quantification to achieve automated, robust MI approximation. Evaluated on synthetic benchmarks and real-world OpenSSL TLS datasets, our approach reduces MI estimation error by 37% and achieves an IL detection AUC of 0.92—substantially outperforming state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

In today's data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor's log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

Problem

Research questions and friction points this paper is trying to address.

Detect information leakage via Bayes-optimal prediction

Overcome limitations of mutual information estimation

Improve accuracy in identifying sensitive data exposure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated machine learning estimates mutual information

Approximates Bayes predictor log-loss and accuracy

Superior performance on synthetic and real-world datasets

🔎 Similar Papers

Data Reconstruction Attacks and Defenses: A Systematic Evaluation