CRIS Current Research Information System

The integration of distributed big data analytics into modern industrial environments has become increasingly critical, particularly with the rise of data-intensive applications and the need for real-time processing at the edge. While High-Performance Computing (HPC) systems offer robust petabyte-scale capabilities for efficient big data analytics, the performance of big data frameworks, especially on ARM-based HPC systems, remains underexplored. This paper presents an extensive experimental study on deploying Apache Spark 3.0.2, the de facto standard in-memory processing system, on an ARM-based HPC system. This study conducts a comprehensive performance evaluation of Apache Spark through representative big data workloads, including K-means clustering, to assess the effects of latency variations, such as those induced by network delays, memory bottle- necks, or computational overheads, on application performance in industrial IoT and edge computing environments. Our findings contribute to an understanding of how big data frameworks like Apache Spark can be effectively deployed and optimized on ARM- based HPC systems, particularly when leveraging vectorized instruction sets such as SVE, contributing to the broader goal of enhancing the integration of cloud–edge computing paradigms in modern industrial environments. We also discuss potential improvements and strategies for leveraging ARM-based architectures to support scalable, efficient, and real-time data processing in Industry 4.0 and beyond.

Al Jawarneh, I.M.H., Rosa, L., Venanzi, R., Foschini, L., Bellavista, P. (2025). Efficient Parallel Processing of Big Data on Supercomputers for Industrial IoT Environments. ELECTRONICS, 14(13), 1-25.

Efficient Parallel Processing of Big Data on Supercomputers for Industrial IoT Environments

Isam Mashhour Al Jawarneh;Lorenzo Rosa;Riccardo Venanzi;Luca Foschini;Paolo Bellavista

2025

Abstract

The integration of distributed big data analytics into modern industrial environments has become increasingly critical, particularly with the rise of data-intensive applications and the need for real-time processing at the edge. While High-Performance Computing (HPC) systems offer robust petabyte-scale capabilities for efficient big data analytics, the performance of big data frameworks, especially on ARM-based HPC systems, remains underexplored. This paper presents an extensive experimental study on deploying Apache Spark 3.0.2, the de facto standard in-memory processing system, on an ARM-based HPC system. This study conducts a comprehensive performance evaluation of Apache Spark through representative big data workloads, including K-means clustering, to assess the effects of latency variations, such as those induced by network delays, memory bottle- necks, or computational overheads, on application performance in industrial IoT and edge computing environments. Our findings contribute to an understanding of how big data frameworks like Apache Spark can be effectively deployed and optimized on ARM- based HPC systems, particularly when leveraging vectorized instruction sets such as SVE, contributing to the broader goal of enhancing the integration of cloud–edge computing paradigms in modern industrial environments. We also discuss potential improvements and strategies for leveraging ARM-based architectures to support scalable, efficient, and real-time data processing in Industry 4.0 and beyond.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				ELECTRONICS
			
	Citazione
	
				Al Jawarneh, I.M.H., Rosa, L., Venanzi, R., Foschini, L., Bellavista, P. (2025). Efficient Parallel Processing of Big Data on Supercomputers for Industrial IoT Environments. ELECTRONICS, 14(13), 1-25.
			
	Tutti gli autori
	
						Al Jawarneh, Isam Mashhour Hasan; Rosa, Lorenzo; Venanzi, Riccardo; Foschini, Luca; Bellavista, Paolo
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
electronics-14-02626.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.61 MB Formato Adobe PDF Visualizza/Apri	1.61 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1018637

Citazioni

ND

4

3

ND

social impact