Amulet: Fast TEE-Shielded Inference for On-Device Model Protection

πŸ“… 2025-12-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Machine learning models deployed on end devices are vulnerable to model extraction attacks, while trusted execution environments (TEEs) incur high inference latency due to frequent memory switching. Method: This paper proposes Amuletβ€”a novel framework that jointly designs information-theoretically secure neural network obfuscation with TEEs. It enables model weights to reside safely in untrusted memory, requiring only two low-overhead TEE interactions per inference. Amulet integrates GPU acceleration and architecture-agnostic optimizations to support diverse models, from ResNet to GPT-2. Results: Experiments show Amulet incurs only 2.8–4.8Γ— higher latency than unprotected baselines, achieves 8–9Γ— speedup over pure-TEE approaches, and outperforms the state-of-the-art obfuscation method by 2.2Γ—β€”all with negligible accuracy loss. Thus, Amulet simultaneously delivers strong security, high efficiency, and practical deployability.

Technology Category

Application Category

πŸ“ Abstract
On-device machine learning (ML) introduces new security concerns about model privacy. Storing valuable trained ML models on user devices exposes them to potential extraction by adversaries. The current mainstream solution for on-device model protection is storing the weights and conducting inference within Trusted Execution Environments (TEEs). However, due to limited trusted memory that cannot accommodate the whole model, most existing approaches employ a partitioning strategy, dividing a model into multiple slices that are loaded into the TEE sequentially. This frequent interaction between untrusted and trusted worlds dramatically increases inference latency, sometimes by orders of magnitude. In this paper, we propose Amulet, a fast TEE-shielded on-device inference framework for ML model protection. Amulet incorporates a suite of obfuscation methods specifically designed for common neural network architectures. After obfuscation by the TEE, the entire transformed model can be securely stored in untrusted memory, allowing the inference process to execute directly in untrusted memory with GPU acceleration. For each inference request, only two rounds of minimal-overhead interaction between untrusted and trusted memory are required to process input samples and output results. We also provide theoretical proof from an information-theoretic perspective that the obfuscated model does not leak information about the original weights. We comprehensively evaluated Amulet using diverse model architectures ranging from ResNet-18 to GPT-2. Our approach incurs inference latency only 2.8-4.8x that of unprotected models with negligible accuracy loss, achieving an 8-9x speedup over baseline methods that execute inference entirely within TEEs, and performing approximately 2.2x faster than the state-of-the-art obfuscation-based method.
Problem

Research questions and friction points this paper is trying to address.

Protects on-device ML models from adversarial extraction
Reduces inference latency caused by TEE memory limitations
Securely stores obfuscated models in untrusted memory for GPU acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Obfuscates entire model for untrusted memory storage
Minimizes trusted-untrusted memory interaction to two rounds
Enables GPU acceleration in untrusted memory for inference
πŸ”Ž Similar Papers
No similar papers found.
Z
Zikai Mao
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University
Lingchen Zhao
Lingchen Zhao
Associate Professor, School of Cyber Science and Engineering, Wuhan University
Secure ComputationAI Security
L
Lei Xu
School of Mathematics and Statistics, Nanjing University of Science and Technology
Wentao Dong
Wentao Dong
Shanghai Jiao Tong University, Student
Reinforcement LearningRobotics
Shenyi Zhang
Shenyi Zhang
Wuhan University
AI SecurityAdversarial Machine LearningLarge Language Models
C
Cong Wang
City University of Hong Kong
Q
Qian Wang
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University