๐ค AI Summary
This work proposes a unified privacy-preserving framework based on the CKKS homomorphic encryption scheme to mitigate the risk of privacy leakage associated with processing sensitive data in plaintext during machine learning. For the first time, it enables encrypted training of both k-nearest neighbors (KNN) and linear regression, as well as encrypted inference for multilayer perceptrons, within a single system. By integrating approximation techniques to handle non-polynomial operations and effectively managing ciphertext noise to enhance computational efficiency, the framework maintains end-to-end data encryption while achieving model accuracy comparable to that of plaintext training. The results demonstrate the practical feasibility of privacy-preserving machine learning and highlight key challenges remaining in computational overhead and functional expressiveness.
๐ Abstract
The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.