The LCG File Catalog is a key component of the LHC Computing Grid middleware [1], as it contains the mapping between Logical File Names and Physical File Names on the Grid. The Atlas computing model foresees multiple local LFC housed in each Tier-1 and Tier-0, containing all information about files stored in the regional cloud. As the local LFC contents are presently not replicated anywhere, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. In order to solve this problem we propose a novel solution for high availability (HA) of Oracle based Grid services, obtained by composing an Oracle Data Guard deployment and a series of application level scripts. This approach has the advantage of being very easy to deploy and maintain, and represents a good candidate solution for all Tier-2s which are usually little centres with little manpower dedicated to service operations. We also present the results of a wide range of functionality and performance tests run on a test-bed having characteristics similar to the ones required for production. The test-bed consists of a failover deployment between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. Moreover, we explain how the proposed strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users and applications.
Barbara Martelli, Alessandro de Salvo, Daniela Anzellotti, Lorenzo Rinaldi, Alessandro Cavalli, Stefano dal Pra, et al. (2010). A lightweight high availability strategy for Atlas LCG File Catalogs. JOURNAL OF PHYSICS. CONFERENCE SERIES, 219, 042014-042023 [10.1088/1742-6596/219/4/042014].
A lightweight high availability strategy for Atlas LCG File Catalogs
RINALDI, LORENZO;
2010
Abstract
The LCG File Catalog is a key component of the LHC Computing Grid middleware [1], as it contains the mapping between Logical File Names and Physical File Names on the Grid. The Atlas computing model foresees multiple local LFC housed in each Tier-1 and Tier-0, containing all information about files stored in the regional cloud. As the local LFC contents are presently not replicated anywhere, this turns out in a dangerous single point of failure for all of the Atlas regional clouds. In order to solve this problem we propose a novel solution for high availability (HA) of Oracle based Grid services, obtained by composing an Oracle Data Guard deployment and a series of application level scripts. This approach has the advantage of being very easy to deploy and maintain, and represents a good candidate solution for all Tier-2s which are usually little centres with little manpower dedicated to service operations. We also present the results of a wide range of functionality and performance tests run on a test-bed having characteristics similar to the ones required for production. The test-bed consists of a failover deployment between the Italian LHC Tier-1 (INFN - CNAF) and an Atlas Tier-2 located at INFN - Roma1. Moreover, we explain how the proposed strategy can be deployed on the present Grid infrastructure, without requiring any change to the middleware and in a way that is totally transparent to end users and applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.