Optimizing Data Flow in Binary Neural Networks

Vorabbi, L.; Maltoni, D.; Santi, S.

doi:10.3390/s24154780

Binary neural networks (BNNs) can substantially accelerate a neural network's inference time by substituting its costly floating-point arithmetic with bit-wise operations. Nevertheless, state-of-the-art approaches reduce the efficiency of the data flow in the BNN layers by introducing intermediate conversions from 1 to 16/32 bits. We propose a novel training scheme, denoted as BNN-Clip, that can increase the parallelism and data flow of the BNN pipeline; specifically, we introduce a clipping block that reduces the data width from 32 bits to 8. Furthermore, we decrease the internal accumulator size of a binary layer, usually kept using 32 bits to prevent data overflow, with no accuracy loss. Moreover, we propose an optimization of the batch normalization layer that reduces latency and simplifies deployment. Finally, we present an optimized implementation of the binary direct convolution for ARM NEON instruction sets. Our experiments show a consistent inference latency speed-up (up to 1.3 and 2.4x compared to two state-of-the-art BNN frameworks) while reaching an accuracy comparable with state-of-the-art approaches on datasets like CIFAR-10, SVHN, and ImageNet.

Vorabbi, L., Maltoni, D., Santi, S. (2024). Optimizing Data Flow in Binary Neural Networks. SENSORS, 24(15), 1-15 [10.3390/s24154780].

Optimizing Data Flow in Binary Neural Networks

Vorabbi L.^Primo;Maltoni D.^Secondo;Santi S.^Ultimo

2024

Abstract

Binary neural networks (BNNs) can substantially accelerate a neural network's inference time by substituting its costly floating-point arithmetic with bit-wise operations. Nevertheless, state-of-the-art approaches reduce the efficiency of the data flow in the BNN layers by introducing intermediate conversions from 1 to 16/32 bits. We propose a novel training scheme, denoted as BNN-Clip, that can increase the parallelism and data flow of the BNN pipeline; specifically, we introduce a clipping block that reduces the data width from 32 bits to 8. Furthermore, we decrease the internal accumulator size of a binary layer, usually kept using 32 bits to prevent data overflow, with no accuracy loss. Moreover, we propose an optimization of the batch normalization layer that reduces latency and simplifies deployment. Finally, we present an optimized implementation of the binary direct convolution for ARM NEON instruction sets. Our experiments show a consistent inference latency speed-up (up to 1.3 and 2.4x compared to two state-of-the-art BNN frameworks) while reaching an accuracy comparable with state-of-the-art approaches on datasets like CIFAR-10, SVHN, and ImageNet.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				SENSORS
			
	Codice DOI
	
				https://dx.doi.org/10.3390/s24154780
			
	Citazione
	
				Vorabbi, L., Maltoni, D., Santi, S. (2024). Optimizing Data Flow in Binary Neural Networks. SENSORS, 24(15), 1-15 [10.3390/s24154780].
			
	Tutti gli autori
	
						Vorabbi, L.; Maltoni, D.; Santi, S.

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1003799

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

2

3

3

CRIS Current Research Information System