With the shrinking of technology nodes and the use of parallel processor clusters in hostile and critical environments, such as space, run-time faults caused by radiation are a serious cross-cutting concern, also impacting architectural design. This paper introduces an architectural approach to run-time configurable soft-error tolerance at the core level, augmenting a six-core open-source RISC-V cluster with a novel On-Demand Redundancy Grouping (ODRG) scheme. ODRG allows the cluster to operate either as two fault-tolerant cores, or six individual cores for high-performance, with limited overhead to switch between these modes during run-time. The ODRG unit adds less than 11% of a core's area for a three-core group, or a total of 1% of the cluster area, and shows negligible timing increase, which compares favorably to a commercial state-of-the-art implementation, and is 2.5× faster in fault recovery re-synchronization. Furthermore, when redundancy is not necessary, the ODRG approach allows the redundant cores to be used for independent computation, allowing up to 2.96× increase in performance for selected applications.

Rogenmoser, M., Wistoff, N., Vogel, P., Gurkaynak, F., Benini, L. (2022). On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster [10.1109/ISVLSI54635.2022.00089].

On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster

Benini L.
2022

Abstract

With the shrinking of technology nodes and the use of parallel processor clusters in hostile and critical environments, such as space, run-time faults caused by radiation are a serious cross-cutting concern, also impacting architectural design. This paper introduces an architectural approach to run-time configurable soft-error tolerance at the core level, augmenting a six-core open-source RISC-V cluster with a novel On-Demand Redundancy Grouping (ODRG) scheme. ODRG allows the cluster to operate either as two fault-tolerant cores, or six individual cores for high-performance, with limited overhead to switch between these modes during run-time. The ODRG unit adds less than 11% of a core's area for a three-core group, or a total of 1% of the cluster area, and shows negligible timing increase, which compares favorably to a commercial state-of-the-art implementation, and is 2.5× faster in fault recovery re-synchronization. Furthermore, when redundancy is not necessary, the ODRG approach allows the redundant cores to be used for independent computation, allowing up to 2.96× increase in performance for selected applications.
2022
2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
398
401
Rogenmoser, M., Wistoff, N., Vogel, P., Gurkaynak, F., Benini, L. (2022). On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster [10.1109/ISVLSI54635.2022.00089].
Rogenmoser, M.; Wistoff, N.; Vogel, P.; Gurkaynak, F.; Benini, L.
File in questo prodotto:
File Dimensione Formato  
ISVLSI_On_Demand_Redundancy_Grouping_Final.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 180.48 kB
Formato Adobe PDF
180.48 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/907557
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact