CRIS Current Research Information System

The ability to ingest, process, and analyze large datasets within minimal timeframes is a milestone of big data applications. In the realm of High Energy Physics (HEP) at CERN, this capability is especially critical as the upcoming high-luminosity phase of the LHC will generate vast amounts of data, reaching scales of approximately 100 PB/year. Recent advancements in resource management and software development have enabled more flexible and dynamic data access, alongside the integration with open-source tools like Jupyter, Dask, and HTCondor. These advancements facilitate a shift from a traditional “batch-like” processing to an interactive, high-throughput platform that utilizes a distributed, parallel back-end architecture. This approach is further supported by the DataLake model developed by the Italian National Center for “High-Performance Computing, Big Data, and Quantum Computing Research Centre” (ICSC). This contribution highlights the transition of various data analysis applications, from legacy batch processing to a more interactive, declarative paradigm using tools like ROOT RDataFrame. These applications are executed on the aforementioned cloud-based infrastructure, with workflows distributed across multiple worker nodes and results consolidated into a unified interface. Additionally, the performance of this approach is evaluated through speed-up benchmarks and scalability tests using distributed resources. Such analyses could help identify potential bottlenecks or limitations of the high-throughput interactive model, providing insights that will guide its further development and implementation within the Italian National Center.

Diotalevi, T., Gravili, F. (2025). Quasi interactive analysis of High Energy Physics big data with high throughput [10.22323/1.488.0027].

Quasi interactive analysis of High Energy Physics big data with high throughput

Diotalevi, Tommaso;Gravili, Francesco

2025

Abstract

The ability to ingest, process, and analyze large datasets within minimal timeframes is a milestone of big data applications. In the realm of High Energy Physics (HEP) at CERN, this capability is especially critical as the upcoming high-luminosity phase of the LHC will generate vast amounts of data, reaching scales of approximately 100 PB/year. Recent advancements in resource management and software development have enabled more flexible and dynamic data access, alongside the integration with open-source tools like Jupyter, Dask, and HTCondor. These advancements facilitate a shift from a traditional “batch-like” processing to an interactive, high-throughput platform that utilizes a distributed, parallel back-end architecture. This approach is further supported by the DataLake model developed by the Italian National Center for “High-Performance Computing, Big Data, and Quantum Computing Research Centre” (ICSC). This contribution highlights the transition of various data analysis applications, from legacy batch processing to a more interactive, declarative paradigm using tools like ROOT RDataFrame. These applications are executed on the aforementioned cloud-based infrastructure, with workflows distributed across multiple worker nodes and results consolidated into a unified interface. Additionally, the performance of this approach is evaluated through speed-up benchmarks and scalability tests using distributed resources. Such analyses could help identify potential bottlenecks or limitations of the high-throughput interactive model, providing insights that will guide its further development and implementation within the Italian National Center.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Proceedings of Science (PoS)
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				8
			
	Rivista
	
				POS PROCEEDINGS OF SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.22323/1.488.0027
			
	Citazione
	
				Diotalevi, T., Gravili, F. (2025). Quasi interactive analysis of High Energy Physics big data with high throughput [10.22323/1.488.0027].
			
	Tutti gli autori
	
						Diotalevi, Tommaso; Gravili, Francesco
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ISGC2025_027.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 421.82 kB Formato Adobe PDF Visualizza/Apri	421.82 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1050690

Citazioni

ND

0

ND

ND

social impact