Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems

Cavalcante, M.; Kurth, A.; Schuiki, F.; Benini, L.

doi:10.1145/3387902.3392631

In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the challenges and hardware complexity of bridging the gap from a non-coherent, burst-based memory hierarchy to a coherent, cache-line-based one. We design and implement an open-source hardware module that reaches 97% peak throughput over a wide range of realistic linear algebra kernels and is suited for a wide spectrum of memory architectures. Implemented in a state-of-the-art 22 nm FD-SOI technology, our module bridges up to 650 Gbps at 130 fJ/bit and has a complexity of less than 1 kGE/Gbps.

Cavalcante M., Kurth A., Schuiki F., Benini L. (2020). Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems. Association for Computing Machinery, Inc [10.1145/3387902.3392631].

Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems

Cavalcante M.;Kurth A.;Schuiki F.;Benini L.

2020

Abstract

In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the challenges and hardware complexity of bridging the gap from a non-coherent, burst-based memory hierarchy to a coherent, cache-line-based one. We design and implement an open-source hardware module that reaches 97% peak throughput over a wide range of realistic linear algebra kernels and is suited for a wide spectrum of memory architectures. Implemented in a state-of-the-art 22 nm FD-SOI technology, our module bridges up to 650 Gbps at 130 fJ/bit and has a complexity of less than 1 kGE/Gbps.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Titolo del volume
	
				17th ACM International Conference on Computing Frontiers 2020, CF 2020 - Proceedings
			
	Pagina iniziale
	
				81
			
	Pagina finale
	
				88
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3387902.3392631
			
	Citazione
	
				Cavalcante M.,  Kurth A.,  Schuiki F.,  Benini L. (2020). Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems. Association for Computing Machinery, Inc [10.1145/3387902.3392631].
			
	Tutti gli autori
	
						Cavalcante M.; Kurth A.; Schuiki F.; Benini L.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/798347

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

4

ND

CRIS Current Research Information System