CRIS Current Research Information System

Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics (HEP) towards the exascale regime. In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about replicas placement, transfer operations, and actual user access to physics datasets). All this data richness is currently residing on a distributed Hadoop cluster, and it is organized so that running fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM (Monte Carlo Management system), we started to focus on data storage and data access over the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims at contributing to the overall design of a predictive/adaptive system that would eventually reduce costs and complexity of the CMS computing operations, while taking into account the stringent requests by the physics analysts community

Simone Gasperini, Simone Rossi Tisbeni, Daniele Bonacorsi, David Lange (2022). Exploiting Big Data solutions for CMS computing operations analytics [10.22323/1.415.0006].

Exploiting Big Data solutions for CMS computing operations analytics

Simone Gasperini^Primo;Simone Rossi Tisbeni;Daniele Bonacorsi;David Lange

2022

Abstract

Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level. A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics (HEP) towards the exascale regime. In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about replicas placement, transfer operations, and actual user access to physics datasets). All this data richness is currently residing on a distributed Hadoop cluster, and it is organized so that running fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM (Monte Carlo Management system), we started to focus on data storage and data access over the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims at contributing to the overall design of a predictive/adaptive system that would eventually reduce costs and complexity of the CMS computing operations, while taking into account the stringent requests by the physics analysts community

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				International Symposium on Grids & Clouds 2022 (ISGC2022)
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				11
			
	Codice DOI
	
				https://dx.doi.org/10.22323/1.415.0006
			
	Citazione
	
				Simone Gasperini,  Simone Rossi Tisbeni,  Daniele Bonacorsi,  David Lange (2022). Exploiting Big Data solutions for CMS computing operations analytics [10.22323/1.415.0006].
			
	Tutti gli autori
	
						Simone Gasperini; Simone Rossi Tisbeni; Daniele Bonacorsi; David Lange
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ISGC2022_006.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 932.08 kB Formato Adobe PDF Visualizza/Apri	932.08 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/916944

Citazioni

ND

0

ND

social impact