Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

πŸ“… 2025-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Deploying differentially private (DP) language models in mobile input methods faces challenges in balancing privacy guarantees, model efficiency, and inference latency. Method: This paper introduces the first DP Transformer language model deployed in the commercial SwiftKey keyboard. To reconcile privacy, model compactness, and inference speed, we propose a two-stage training paradigm: general-domain pretraining followed by DP-SGD-constrained fine-tuning on real-world typing data. We design a GPT-2–scaled architecture optimized for edge devices and integrate an ONNX-based inference engine. Contribution/Results: Our approach achieves significant improvements in next-word prediction accuracy over the production-grade GRU baseline, with only marginal increases in memory footprint and latency. Privacy is rigorously certified under a tight budget (Ξ΅ ≀ 4). Key contributions include: (1) the first DP-trained Transformer deployed in a commercial mobile input method; (2) a mobile-optimized DP-Transformer architecture and end-to-end deployment pipeline; and (3) empirical validation of lightweight Transformers as viable, high-performance models for privacy-sensitive on-device applications.

Technology Category

Application Category

πŸ“ Abstract
In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Balancing model size, speed, and accuracy trade-offs
Improving next-word-prediction with differential privacy
Integrating compact transformer via ONNX efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer trained with differential privacy
Scaled-down GPT2 architecture for efficiency
Two-stage training with ONNX integration
πŸ”Ž Similar Papers
No similar papers found.
A
Abdelrahman Abouelenin
Microsoft
M
Mohamed Abdelrehim
Microsoft
R
Raffy Fahim
Microsoft
A
Amr Hendy
Microsoft
Mohamed Afify
Mohamed Afify
Microsoft ATLC
Speech recognition