Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Deploying differentially private (DP) language models in mobile input methods faces challenges in balancing privacy guarantees, model efficiency, and inference latency. Method: This paper introduces the first DP Transformer language model deployed in the commercial SwiftKey keyboard. To reconcile privacy, model compactness, and inference speed, we propose a two-stage training paradigm: general-domain pretraining followed by DP-SGD-constrained fine-tuning on real-world typing data. We design a GPT-2–scaled architecture optimized for edge devices and integrate an ONNX-based inference engine. Contribution/Results: Our approach achieves significant improvements in next-word prediction accuracy over the production-grade GRU baseline, with only marginal increases in memory footprint and latency. Privacy is rigorously certified under a tight budget (ε ≤ 4). Key contributions include: (1) the first DP-trained Transformer deployed in a commercial mobile input method; (2) a mobile-optimized DP-Transformer architecture and end-to-end deployment pipeline; and (3) empirical validation of lightweight Transformers as viable, high-performance models for privacy-sensitive on-device applications.

Technology Category

Application Category

📝 Abstract

In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Balancing model size, speed, and accuracy trade-offs

Improving next-word-prediction with differential privacy

Integrating compact transformer via ONNX efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer trained with differential privacy

Scaled-down GPT2 architecture for efficiency

Two-stage training with ONNX integration

🔎 Similar Papers

No similar papers found.