The LCG File Catalog is a key component of the LHC Computing Grid middleware [1], as it contains the mapping between Logical File Names and Physical File Names on the Grid. The Atlas computing model foresees multiple local LFC housed in each Tier-1 and Tier-0, containing all information about files stored in the regional cloud. As the local LFC contents are presently not replicated anywhere, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. In order to solve this problem we propose a novel solution for high availability (HA) of Oracle based Grid services, obtained by composing an Oracle Data Guard deployment and a series of application level scripts. This approach has the advantage of being very easy to deploy and maintain, and represents a good candidate solution for all Tier-2s which are usually little centres with little manpower dedicated to service operations. We also present the results of a wide range of functionality and performance tests run on a test-bed having characteristics similar to the ones required for production. The test-bed consists of a failover deployment between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. Moreover, we explain how the proposed strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users and applications.

A lightweight high availability strategy for Atlas LCG File Catalogs

RINALDI, LORENZO;
2010

Abstract

The LCG File Catalog is a key component of the LHC Computing Grid middleware [1], as it contains the mapping between Logical File Names and Physical File Names on the Grid. The Atlas computing model foresees multiple local LFC housed in each Tier-1 and Tier-0, containing all information about files stored in the regional cloud. As the local LFC contents are presently not replicated anywhere, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. In order to solve this problem we propose a novel solution for high availability (HA) of Oracle based Grid services, obtained by composing an Oracle Data Guard deployment and a series of application level scripts. This approach has the advantage of being very easy to deploy and maintain, and represents a good candidate solution for all Tier-2s which are usually little centres with little manpower dedicated to service operations. We also present the results of a wide range of functionality and performance tests run on a test-bed having characteristics similar to the ones required for production. The test-bed consists of a failover deployment between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. Moreover, we explain how the proposed strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users and applications.
Barbara Martelli;Alessandro de Salvo;Daniela Anzellotti;Lorenzo Rinaldi;Alessandro Cavalli;Stefano dal Pra;Luca dell'Agnello;Daniele Gregori;Andrea Prosperini;Pier Paolo Ricci;Vladimir Sapunenko
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/397727
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact