The analysis of event sequence data that contains system failures is becoming increasingly important in the design of service and maintenance policies. This paper presents a systematic methodology to construct a statistical prediction model for failure event based on event sequence data. First, frequent failure signatures, defined as a group of events/errors that repeatedly occur together, are identified automatically from the event sequence by use of an efficient algorithm. Then, the Cox proportional hazard model, that is extensively used in biomedical survival analysis, is used to provide a statistically rigorous prediction of system failures based on the time-to-failure data extracted from the event sequences. The identified failure signatures are used to select significant covariates for the Cox model, i.e., only the events and/or event combinations in the signatures are treated as explanatory variables in the Cox model fitting. By combining the failure signature and Cox model approaches the proposed method can effectively handle the situation of a long event sequence and a large number of event types in the sequence. Its effectiveness is illustrated by a numerical study and analysis of real-world data. The proposed method can help proactively diagnose machine faults with a sufficient lead time before actual system failures to allow preventive maintenance to be scheduled thereby reducing the downtime costs.
Fronza I, Sillitti A, Succi G, Vlasenko J (2011). Failure Prediction based on Log Files Using the Cox Proportional Hazard Model.
Failure Prediction based on Log Files Using the Cox Proportional Hazard Model
Succi G;
2011
Abstract
The analysis of event sequence data that contains system failures is becoming increasingly important in the design of service and maintenance policies. This paper presents a systematic methodology to construct a statistical prediction model for failure event based on event sequence data. First, frequent failure signatures, defined as a group of events/errors that repeatedly occur together, are identified automatically from the event sequence by use of an efficient algorithm. Then, the Cox proportional hazard model, that is extensively used in biomedical survival analysis, is used to provide a statistically rigorous prediction of system failures based on the time-to-failure data extracted from the event sequences. The identified failure signatures are used to select significant covariates for the Cox model, i.e., only the events and/or event combinations in the signatures are treated as explanatory variables in the Cox model fitting. By combining the failure signature and Cox model approaches the proposed method can effectively handle the situation of a long event sequence and a large number of event types in the sequence. Its effectiveness is illustrated by a numerical study and analysis of real-world data. The proposed method can help proactively diagnose machine faults with a sufficient lead time before actual system failures to allow preventive maintenance to be scheduled thereby reducing the downtime costs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.