🤖 AI Summary
Memory safety vulnerabilities have long plagued pointer-intensive languages like C/C++. Existing compiler- and ISA-level mitigations face deployment barriers due to prohibitive performance overhead or limited applicability. This paper presents the first production-oriented, zero-memory-overhead Memory Tagging Extension (MTE) deployment for datacenter workloads on AmpereOne (ARM AArch64) processors. Leveraging synchronized hardware tag checking, co-designed software-hardware optimizations, and fine-grained runtime memory management, our approach delivers deterministic memory safety guarantees across mainstream datacenter benchmarks with sub-3% average performance overhead—effectively near-zero cost. Crucially, we eliminate MTE’s traditional memory bloat penalty for the first time and systematically identify memory management operations as the dominant remaining overhead source. This insight enables a highly practical, scalable path toward production-grade MTE adoption in large-scale datacenter environments.
📝 Abstract
Memory-safety escapes continue to form the launching pad for a wide range of security attacks, especially for the substantial base of deployed software that is coded in pointer-based languages such as C/C++. Although compiler and Instruction Set Architecture (ISA) extensions have been introduced to address elements of this issue, the overhead and/or comprehensive applicability have limited broad production deployment. The Memory Tagging Extension (MTE) to the ARM AArch64 Instruction Set Architecture is a valuable tool to address memory-safety escapes; when used in synchronous tag-checking mode, MTE provides deterministic detection and prevention of sequential buffer overflow attacks, and probabilistic detection and prevention of exploits resulting from temporal use-after-free pointer programming bugs. The AmpereOne processor, launched in 2024, is the first datacenter processor to support MTE. Its optimized MTE implementation uniquely incurs no memory capacity overhead for tag storage and provides synchronous tag-checking with single-digit performance impact across a broad range of datacenter class workloads. Furthermore, this paper analyzes the complete hardware-software stack, identifying application memory management as the primary remaining source of overhead and highlighting clear opportunities for software optimization. The combination of an efficient hardware foundation and a clear path for software improvement makes the MTE implementation of the AmpereOne processor highly attractive for deployment in production cloud environments.