🤖 AI Summary
To address the privacy-compliance challenge in ransomware detection amid cross-organizational data silos, this paper proposes a federated learning (FL)-based distributed collaborative detection framework. The framework enables multi-institutional model co-training via the Sherpa.ai FL platform without transferring raw data across organizational boundaries, while integrating AI-driven anomaly access pattern recognition to precisely identify ransomware behavior. Its key contribution lies in resolving the tension between strict data localization requirements and high detection performance: experiments demonstrate a 9% accuracy improvement over isolated local models, with performance approaching that of centralized training. Moreover, the framework supports continuous model updates and scalable cross-organizational deployment. It thus establishes a novel cybersecurity defense paradigm for interconnected environments—such as cloud storage and enterprise file-sharing systems—that simultaneously ensures privacy preservation, regulatory compliance (e.g., GDPR, CCPA), and operational detection efficacy.
📝 Abstract
Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. Training high-performing artificial intelligence (AI) detectors requires diverse datasets, which are often distributed across multiple organizations, making centralization necessary. However, centralized learning is often impractical due to security, privacy regulations, data ownership issues, and legal barriers to cross-organizational sharing. Compounding this challenge, ransomware evolves rapidly, demanding models that are both robust and adaptable.
In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, which enables multiple organizations to collaboratively train a ransomware detection model while keeping raw data local and secure. This paradigm is particularly relevant for cybersecurity companies (including both software and hardware vendors) that deploy ransomware detection or firewall systems across millions of endpoints. In such environments, data cannot be transferred outside the customer's device due to strict security, privacy, or regulatory constraints. Although FL applies broadly to malware threats, we validate the approach using the Ransomware Storage Access Patterns (RanSAP) dataset.
Our experiments demonstrate that FL improves ransomware detection accuracy by a relative 9% over server-local models and achieves performance comparable to centralized training. These results indicate that FL offers a scalable, high-performing, and privacy-preserving framework for proactive ransomware detection across organizational and regulatory boundaries.