CRIS Current Research Information System

Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inlexible architectural and microarchitectural modiications to introduce modular redundancy within a system leads to signiicant area increase and performance degradation. To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a lexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities. Further, we propose two recovery approaches, software-based and hardware-based, trading of performance and area overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1160 MOPS on a matrix multiplication benchmark when conigured in non-redundant mode and 617 and 414 MOPS in dual and triple mode, respectively. A software-based recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2, representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster. As a high-performance alternative, a new hardware-based method provides rapid fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely ∼9.4% area overhead over the baseline non-redundant RISC-V cluster. The cluster is also enhanced with split-lock capabilities to enter one of the available redundant modes with minimum performance loss, allowing execution of a mission-critical portion of code when in independent mode, or a performance section when in a reliability mode, with <400 clock cycles overhead for entry and exit. The proposed system is the irst to integrate these functionalities on an open-source RISC-V-based compute device, enabling inely tunable reliability vs. performance trade-ofs.

Rogenmoser, M., Tortorella, Y., Rossi, D., Conti, F., Benini, L. (2023). Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space. ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 9(1), 1-29 [10.1145/3635161].

Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

Michael Rogenmoser;Yvan Tortorella;Davide Rossi;Francesco Conti;Luca Benini

2023

Abstract

Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inlexible architectural and microarchitectural modiications to introduce modular redundancy within a system leads to signiicant area increase and performance degradation. To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a lexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities. Further, we propose two recovery approaches, software-based and hardware-based, trading of performance and area overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1160 MOPS on a matrix multiplication benchmark when conigured in non-redundant mode and 617 and 414 MOPS in dual and triple mode, respectively. A software-based recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2, representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster. As a high-performance alternative, a new hardware-based method provides rapid fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely ∼9.4% area overhead over the baseline non-redundant RISC-V cluster. The cluster is also enhanced with split-lock capabilities to enter one of the available redundant modes with minimum performance loss, allowing execution of a mission-critical portion of code when in independent mode, or a performance section when in a reliability mode, with <400 clock cycles overhead for entry and exit. The proposed system is the irst to integrate these functionalities on an open-source RISC-V-based compute device, enabling inely tunable reliability vs. performance trade-ofs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3635161
			
	Citazione
	
				Rogenmoser, M., Tortorella, Y., Rossi, D., Conti, F., Benini, L. (2023). Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space. ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 9(1), 1-29 [10.1145/3635161].
			
	Tutti gli autori
	
						Rogenmoser, Michael; Tortorella, Yvan; Rossi, Davide; Conti, Francesco; Benini, Luca
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
3635161.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per accesso libero gratuito Dimensione 1.84 MB Formato Adobe PDF Visualizza/Apri	1.84 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/953255

Citazioni

ND

12

6

social impact