🤖 AI Summary
This work addresses the limitations of existing hardware fault injection tools, which often lack efficiency and flexibility for systematically evaluating the reliability and fault tolerance of computing systems. To overcome these challenges, the authors present the first modular, open-source, and highly configurable fault injection framework integrated into the gem5 simulator. The framework enables precise injection of both hardware and software faults across multiple architectural levels—from registers to caches—and supports sophisticated fault models coupled with fine-grained triggering mechanisms. By offering unprecedented control and scalability, this infrastructure significantly enhances the ability to assess fault-tolerant mechanisms and resilience strategies, thereby providing a powerful and flexible experimental platform for advancing research in high-reliability, high-performance computing systems.
📝 Abstract
Fault injectors are essential tools for evaluating the reliability and resilience of computing systems. They enable the simulation of hardware and software faults to analyze system behavior under error conditions and assess its ability to operate correctly despite disruptions. Such analysis is critical for identifying vulnerabilities and improving system robustness. CHAOS is a modular, open-source, and fully configurable fault injection framework designed for the gem5 simulator. It facilitates precise and systematic fault injection across multiple architectural levels, supporting comprehensive evaluations of fault tolerance mechanisms and resilience strategies. Its high configurability and seamless integration with gem5 allow researchers to explore a wide range of fault models and complex scenarios, making CHAOS a valuable tool for advancing research in dependable and high-performance computing systems.