🤖 AI Summary
As transistor dimensions approach atomic scales, aging mechanisms—including bias temperature instability (BTI), hot carrier injection (HCI), time-dependent dielectric breakdown (TDDB), electromigration (EM), and random variations—severely compromise long-term IC reliability, particularly in high-reliability, long-duration applications such as AI training and autonomous driving. To address this, we systematically analyze the underlying physics of aging and propose, for the first time, a unified aging model capturing circuit-specific degradation behaviors across digital, analog, and SRAM circuits. We further introduce a cross-layer aging-aware design methodology spanning device, circuit, architecture, and software levels—encompassing physics-based aging modeling, on-chip monitoring circuits, resilience-aware architectures, and EDA-level optimization algorithms. Finally, we establish the first comprehensive, full-stack aging management technology roadmap, bridging microscopic degradation mechanisms to system-level lifetime prediction via an engineering-practical pathway. This work provides both theoretical foundations and practical paradigms for extending chip lifetime and reducing total cost of ownership.
📝 Abstract
Reliability has become an increasing concern in modern computing. Integrated circuits (ICs) are the backbone of modern computing devices across industries, including artificial intelligence (AI), consumer electronics, healthcare, automotive, industrial, and aerospace. Moore Law has driven the semiconductor IC industry toward smaller dimensions, improved performance, and greater energy efficiency. However, as transistors shrink to atomic scales, aging-related degradation mechanisms such as Bias Temperature Instability (BTI), Hot Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), Electromigration (EM), and stochastic aging-induced variations have become major reliability threats. From an application perspective, applications like AI training and autonomous driving require continuous and sustainable operation to minimize recovery costs and enhance safety. Additionally, the high cost of chip replacement and reproduction underscores the need for extended lifespans. These factors highlight the urgency of designing more reliable ICs. This survey addresses the critical aging issues in ICs, focusing on fundamental degradation mechanisms and mitigation strategies. It provides a comprehensive overview of aging impact and the methods to counter it, starting with the root causes of aging and summarizing key monitoring techniques at both circuit and system levels. A detailed analysis of circuit-level mitigation strategies highlights the distinct aging characteristics of digital, analog, and SRAM circuits, emphasizing the need for tailored solutions. The survey also explores emerging software approaches in design automation, aging characterization, and mitigation, which are transforming traditional reliability optimization. Finally, it outlines the challenges and future directions for improving aging management and ensuring the long-term reliability of ICs across diverse applications.