Combining Fault Tolerance Techniques and COTS SoC Accelerators for Payload Processing in Space

📅 2022-10-03

🏛️ IEEE/IFIP International Conference on Very Large Scale Integration of System-on-Chip

📈 Citations: 3

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the demand for high-throughput, low-latency, and highly reliable on-board intelligent real-time processing under space radiation, this work tackles task interruption caused by single-event upsets (SEUs) in commercial heterogeneous accelerators—specifically Zynq FPGAs and Myriad VPUs. We propose an end-to-end collaborative fault-tolerant architecture. Our method integrates multi-level heterogeneous redundancy: on the FPGA side, dynamic memory scrubbing, partial reconfiguration, and triple modular redundancy (TMR); on the VPU side, SHAVE-core-level redundancy, ECC-protected instruction/data memories, and a custom CRC-enhanced CIF/LCD interface. A collaborative watchdog mechanism and extended communication protocols ensure cross-chip consistency. Evaluated on real on-board platforms—including CogniSat and Q7S—the architecture significantly reduces SEU-induced task interruptions, enabling robust, efficient, and radiation-hardened on-board intelligent processing.

Technology Category

Application Category

📝 Abstract

The ever-increasing demand for computational power and I/O throughput in space applications is transforming the landscape of on-board computing. A variety of Commercial-Off-The-Shelf (COTS) accelerators emerges as an attractive solution for payload processing to outperform the traditional radiation-hardened devices. Towards increasing the reliability of such COTS accelerators, the current paper explores and evaluates fault-tolerance techniques for the Zynq FPGA and the Myriad VPU, which are two device families being integrated in industrial space avionics architectures/boards, such as Ubotica’s CogniSat, Xiphos’ Q7S, and Cobham Gaisler’s GR-VPX-XCKU060. On the FPGA side, we combine techniques such as memory scrubbing, partial reconfiguration, triple modular redundancy, and watch-dogs. On the VPU side, we detect and correct errors in the instruction and data memories, as well as we apply redundancy at processor level (SHAVE cores). When considering FPGA with VPU co-processing, we also develop a fault-tolerant interface between the two devices based on the CIF/LCD protocols and our custom CRC error-detecting code.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reliability of COTS accelerators for space payload processing

Evaluating fault-tolerance techniques for Zynq FPGA and Myriad VPU

Developing fault-tolerant interface between FPGA and VPU co-processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines fault-tolerance with COTS accelerators

Uses memory scrubbing and triple redundancy

Develops fault-tolerant FPGA-VPU interface

🔎 Similar Papers

Trikarenos: Design and Experimental Characterization of a Fault-Tolerant 28nm RISC-V-based SoC