๐ค AI Summary
System and network event logs contain abundant personally identifiable information (PII), posing significant privacy risks during sharing for collaborative analysis; conventional anonymization techniques often degrade structural or temporal fidelity, undermining analytical utility. This paper proposes a privacy-preserving log anonymization framework that jointly optimizes privacy and utility: IP addresses are anonymized via segmented salted hashing to preserve subnet and host-level hierarchical structure; timestamps undergo adaptive noise injection to obscure absolute time while maintaining relative event ordering. The framework integrates entropy-based privacy quantification, collision-rate analysis, and residual leakage detection to enable rigorous privacyโutility trade-off assessment. Experiments demonstrate that our approach reduces re-identification risk substantially (collision rate < 0.1%), fully preserves contextual consistency and analytical validity, and supports practical deployment. An open-source implementation is provided to ensure reproducibility and real-world adoption.
๐ Abstract
System and network event logs are essential for security analytics, threat detection, and operational monitoring. However, these logs often contain Personally Identifiable Information (PII), raising significant privacy concerns when shared or analyzed. A key challenge in log anonymization is balancing privacy protection with the retention of sufficient structure for meaningful analysis. Overly aggressive anonymization can destroy contextual integrity, while weak techniques risk re-identification through linkage or inference attacks. This paper introduces novel field-specific anonymization methods that address this trade-off. For IP addresses, we propose a salt-based hashing technique applied at the per-octet level, preserving both subnet and host structure to enable correlation across various log entries while ensuring non-reversibility. For port numbers, full-value hashing with range mapping maintains interpretability. We also present an order-preserving timestamp anonymization scheme using adaptive noise injection, which obfuscates exact times without disrupting event sequences. An open-source tool implementing these techniques has been released to support practical deployment and reproducible research. Evaluations using entropy metrics, collision rates, and residual leakage analysis demonstrate that the proposed approach effectively protects privacy while preserving analytical utility.