🤖 AI Summary
Deploying machine learning models on edge devices faces a critical trade-off between high model-stealing risks and the performance overhead and weak protection guarantees of existing Trusted Execution Environment (TEE) architectures. Method: This paper presents the first systematic evaluation of Arm’s Confidential Computing Architecture (CCA) for privacy-performance trade-offs in edge ML inference. We propose a lightweight trusted inference framework leveraging CCA, integrating encrypted model loading, secure execution, and memory isolation to mitigate membership inference attacks. Contribution/Results: Our framework reduces attack success rates by 8.3% while introducing at most 22% inference overhead across diverse tasks—including image classification, speech recognition, and conversational assistants—without compromising model confidentiality. All code and evaluation methodologies are publicly released, offering a practical pathway toward confidential AI deployment on resource-constrained edge platforms.
📝 Abstract
Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for protecting proprietary models, yet existing TEE solutions have architectural constraints that hinder on-device model deployment. Arm Confidential Computing Architecture (CCA), a new Arm extension, addresses several of these limitations and shows promise as a secure platform for on-device ML. In this paper, we evaluate the performance-privacy trade-offs of deploying models within CCA, highlighting its potential to enable confidential and efficient ML applications. Our evaluations show that CCA can achieve an overhead of, at most, 22% in running models of different sizes and applications, including image classification, voice recognition, and chat assistants. This performance overhead comes with privacy benefits; for example, our framework can successfully protect the model against membership inference attack by an 8.3% reduction in the adversary's success rate. To support further research and early adoption, we make our code and methodology publicly available.