Combining Fault Tolerance Techniques and COTS SoC Accelerators for Payload Processing in Space

📅 2022-10-03
🏛️ IEEE/IFIP International Conference on Very Large Scale Integration of System-on-Chip
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the demand for high-throughput, low-latency, and highly reliable on-board intelligent real-time processing under space radiation, this work tackles task interruption caused by single-event upsets (SEUs) in commercial heterogeneous accelerators—specifically Zynq FPGAs and Myriad VPUs. We propose an end-to-end collaborative fault-tolerant architecture. Our method integrates multi-level heterogeneous redundancy: on the FPGA side, dynamic memory scrubbing, partial reconfiguration, and triple modular redundancy (TMR); on the VPU side, SHAVE-core-level redundancy, ECC-protected instruction/data memories, and a custom CRC-enhanced CIF/LCD interface. A collaborative watchdog mechanism and extended communication protocols ensure cross-chip consistency. Evaluated on real on-board platforms—including CogniSat and Q7S—the architecture significantly reduces SEU-induced task interruptions, enabling robust, efficient, and radiation-hardened on-board intelligent processing.

Technology Category

Application Category

📝 Abstract
The ever-increasing demand for computational power and I/O throughput in space applications is transforming the landscape of on-board computing. A variety of Commercial-Off-The-Shelf (COTS) accelerators emerges as an attractive solution for payload processing to outperform the traditional radiation-hardened devices. Towards increasing the reliability of such COTS accelerators, the current paper explores and evaluates fault-tolerance techniques for the Zynq FPGA and the Myriad VPU, which are two device families being integrated in industrial space avionics architectures/boards, such as Ubotica’s CogniSat, Xiphos’ Q7S, and Cobham Gaisler’s GR-VPX-XCKU060. On the FPGA side, we combine techniques such as memory scrubbing, partial reconfiguration, triple modular redundancy, and watch-dogs. On the VPU side, we detect and correct errors in the instruction and data memories, as well as we apply redundancy at processor level (SHAVE cores). When considering FPGA with VPU co-processing, we also develop a fault-tolerant interface between the two devices based on the CIF/LCD protocols and our custom CRC error-detecting code.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reliability of COTS accelerators for space payload processing
Evaluating fault-tolerance techniques for Zynq FPGA and Myriad VPU
Developing fault-tolerant interface between FPGA and VPU co-processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines fault-tolerance with COTS accelerators
Uses memory scrubbing and triple redundancy
Develops fault-tolerant FPGA-VPU interface
🔎 Similar Papers
No similar papers found.
Vasileios Leon
Vasileios Leon
National Technical University of Athens, School of Electrical & Computer Engineering
HW AccelerationDigital IC DesignEmbedded SystemsSoC/FPGASpace Avionics
E
E. Papatheofanous
Department of Physics, National and Kapodistrian University of Athens, 15772 Athens, Greece
G
G. Lentaris
School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
C
Charalampos Bezaitis
Department of Physics, National and Kapodistrian University of Athens, 15772 Athens, Greece
N
N. Mastorakis
School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
G
Georgios Bampilis
School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
D
D. Reisis
Department of Physics, National and Kapodistrian University of Athens, 15772 Athens, Greece
D
D. Soudris
School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece