FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the challenge of efficient row-level fault diagnosis in systolic arrays, where existing methods rely on hardware redundancy and struggle to achieve processing element (PE)-level precision. The authors propose a lightweight algebraic testing scheme that constructs test vectors using coprime integers, enabling unique identification of faulty rows through divisibility properties observed in fault-induced outputs. This approach achieves, for the first time, single-test PE-level fault localization without hardware redundancy, supports a broader class of bounded error models, and allows exact fault identification via a second test or odd-coprime inputs. Under INT16 arithmetic, the method attains over 98% fault localization accuracy for a 256×256 array in a single test, with overhead less than 1% of a single GEMM tile inference.
📝 Abstract
Systolic arrays are the dominant compute fabric for neural network inference. Prior work has addressed column-level fault detection efficiently with uniform test patterns, but row-level (PE-level) fault localization within a faulty column remains open without resorting to hardware redundancy. The fundamental obstacle is that uniform test inputs destroy per-row signatures: any test that activates every row equally cannot distinguish which row is the source of an observed deviation. In this paper, we propose a lightweight, purely algorithmic remedy based on coprime test vectors. By assigning pairwise coprime integers as test-input entries, a permanent weight-register fault produces a deviation whose divisibility signature uniquely identifies the faulty row. Under a general bounded error model, a single test pass localizes the faulty row with high probability. This error model covers a broader class of faults than what prior dataflow-aware testing work has primarily emphasized. When one round is insufficient, a second pass using a ratio computation achieves exact localization; for the special case of single-bit errors, odd coprime entries guarantee exact localization in one round. For INT16 arithmetic, a single test pass covers array sizes up to $256{\times}256$ with localization probability above $0.98$, at a test cost under $1\%$ of one inference GEMM tile.
Problem

Research questions and friction points this paper is trying to address.

fault localization
systolic arrays
PE-level
one-shot testing
algebraic test vectors
Innovation

Methods, ideas, or system contributions that make the work stand out.

systolic arrays
fault localization
coprime test vectors
algebraic testing
one-shot diagnosis
🔎 Similar Papers
No similar papers found.