🤖 AI Summary
To address fault tolerance challenges in safety-critical parallel floating-point accelerators, this paper proposes a runtime-configurable fault-tolerant matrix multiplication architecture. To overcome the high overhead of duplication-based approaches and the inflexibility of error-correcting codes (ECC), we introduce a novel dynamic fault-tolerance mechanism that synergistically integrates task-level redundancy with error-detection coding—ensuring full-path protection of control signals and zero functional errors. Building upon RedMulE, our design employs shadow context registers to configure fault-tolerance modes dynamically, coordinating data-path redundancy with multi-level control-signal protection. Experimental results demonstrate an 11× reduction in uncorrected fault rate; zero functional errors across one million fault-injection trials under full protection; and only a 25.2% area overhead while sustaining a 500 MHz operating frequency in 12 nm CMOS technology.
📝 Abstract
As safety-critical applications increasingly rely on data-parallel floating-point computations, there is an increasing need for flexible and configurable fault tolerance in parallel floating-point accelerators such as tensor engines. While replication-based methods ensure reliability but incur high area and power costs, error correction codes lack the flexibility to trade off robustness against performance. This work presents RedMulE-FT, a runtime-configurable fault-tolerant extension of the RedMulE matrix multiplication accelerator, balancing fault tolerance, area overhead, and performance impacts. The fault tolerance mode is configured in a shadowed context register file before task execution. By combining replication with error-detecting codes to protect the data path, RedMulE-FT achieves an 11x uncorrected fault reduction with only 2.3% area overhead. Full protection extends to control signals, resulting in no functional errors after 1M injections during our extensive fault injection simulation campaign, with a total area overhead of 25.2% while maintaining a 500 MHz frequency in a 12 nm technology.