Training Machine Learning Models on Encrypted Data: A Privacy-Preserving Framework using Homomorphic Encryption

๐Ÿ“… 2026-04-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

236K/year
๐Ÿค– AI Summary
This work proposes a unified privacy-preserving framework based on the CKKS homomorphic encryption scheme to mitigate the risk of privacy leakage associated with processing sensitive data in plaintext during machine learning. For the first time, it enables encrypted training of both k-nearest neighbors (KNN) and linear regression, as well as encrypted inference for multilayer perceptrons, within a single system. By integrating approximation techniques to handle non-polynomial operations and effectively managing ciphertext noise to enhance computational efficiency, the framework maintains end-to-end data encryption while achieving model accuracy comparable to that of plaintext training. The results demonstrate the practical feasibility of privacy-preserving machine learning and highlight key challenges remaining in computational overhead and functional expressiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.
Problem

Research questions and friction points this paper is trying to address.

privacy-preserving machine learning
homomorphic encryption
encrypted data training
data confidentiality
secure computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Homomorphic Encryption
CKKS
Privacy-Preserving Machine Learning
Encrypted Model Training
Approximate Arithmetic