The LHCb experiment at CERN will have an on-line trigger farm composed of up to 2000 PCs. In order to monitor and control each PC and to supervise the overall status of the farm, a Farm Monitoring and Control (FMC) application was developped. The FMC is based on DIM(*) and is accessible both through a command line interface and through a PVSS graphical interface. The FMC consists of a Logger, to collect the application messages (which can work either in no-drop or in congestion-proof mode, with filter and duplicate suppression capability), an IPMI Power Manager to switch on/off the farm nodes and monitor physical parameters, a Task Manager to start/stop processes (able to manage real-time schedulers, to real-time notify a process termination and to redirect application stdout/stderr to the FMC logger), a Process Controller to manage automatic process respawn and a detailed but light-weight Monitoring system. The FMC is an integral part of LHCb's Experiment Control System, in charge of monitoring and controlling all online components: it uses the same tools (DIM, PVSS, FSM, etc.) to guarantee its complete integration and a coherent look and feel throughout the control system.
The LHCb Farm Monitoring and Control System
GALLI, DOMENICO;GREGORI, DANIELE;CARBONE, ANGELO;PECO, GIANLUCA;VAGNONI, VINCENZO MARIA;
2007
Abstract
The LHCb experiment at CERN will have an on-line trigger farm composed of up to 2000 PCs. In order to monitor and control each PC and to supervise the overall status of the farm, a Farm Monitoring and Control (FMC) application was developped. The FMC is based on DIM(*) and is accessible both through a command line interface and through a PVSS graphical interface. The FMC consists of a Logger, to collect the application messages (which can work either in no-drop or in congestion-proof mode, with filter and duplicate suppression capability), an IPMI Power Manager to switch on/off the farm nodes and monitor physical parameters, a Task Manager to start/stop processes (able to manage real-time schedulers, to real-time notify a process termination and to redirect application stdout/stderr to the FMC logger), a Process Controller to manage automatic process respawn and a detailed but light-weight Monitoring system. The FMC is an integral part of LHCb's Experiment Control System, in charge of monitoring and controlling all online components: it uses the same tools (DIM, PVSS, FSM, etc.) to guarantee its complete integration and a coherent look and feel throughout the control system.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.