Amulet: Fast TEE-Shielded Inference for On-Device Model Protection

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Machine learning models deployed on end devices are vulnerable to model extraction attacks, while trusted execution environments (TEEs) incur high inference latency due to frequent memory switching. Method: This paper proposes Amulet—a novel framework that jointly designs information-theoretically secure neural network obfuscation with TEEs. It enables model weights to reside safely in untrusted memory, requiring only two low-overhead TEE interactions per inference. Amulet integrates GPU acceleration and architecture-agnostic optimizations to support diverse models, from ResNet to GPT-2. Results: Experiments show Amulet incurs only 2.8–4.8× higher latency than unprotected baselines, achieves 8–9× speedup over pure-TEE approaches, and outperforms the state-of-the-art obfuscation method by 2.2×—all with negligible accuracy loss. Thus, Amulet simultaneously delivers strong security, high efficiency, and practical deployability.

Technology Category

Application Category

📝 Abstract

On-device machine learning (ML) introduces new security concerns about model privacy. Storing valuable trained ML models on user devices exposes them to potential extraction by adversaries. The current mainstream solution for on-device model protection is storing the weights and conducting inference within Trusted Execution Environments (TEEs). However, due to limited trusted memory that cannot accommodate the whole model, most existing approaches employ a partitioning strategy, dividing a model into multiple slices that are loaded into the TEE sequentially. This frequent interaction between untrusted and trusted worlds dramatically increases inference latency, sometimes by orders of magnitude. In this paper, we propose Amulet, a fast TEE-shielded on-device inference framework for ML model protection. Amulet incorporates a suite of obfuscation methods specifically designed for common neural network architectures. After obfuscation by the TEE, the entire transformed model can be securely stored in untrusted memory, allowing the inference process to execute directly in untrusted memory with GPU acceleration. For each inference request, only two rounds of minimal-overhead interaction between untrusted and trusted memory are required to process input samples and output results. We also provide theoretical proof from an information-theoretic perspective that the obfuscated model does not leak information about the original weights. We comprehensively evaluated Amulet using diverse model architectures ranging from ResNet-18 to GPT-2. Our approach incurs inference latency only 2.8-4.8x that of unprotected models with negligible accuracy loss, achieving an 8-9x speedup over baseline methods that execute inference entirely within TEEs, and performing approximately 2.2x faster than the state-of-the-art obfuscation-based method.

Problem

Research questions and friction points this paper is trying to address.

Protects on-device ML models from adversarial extraction

Reduces inference latency caused by TEE memory limitations

Securely stores obfuscated models in untrusted memory for GPU acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Obfuscates entire model for untrusted memory storage

Minimizes trusted-untrusted memory interaction to two rounds

Enables GPU acceleration in untrusted memory for inference

🔎 Similar Papers

No similar papers found.