Antimicrobial Resistance (AMR) is a global health problem which is estimated to cause ~10 million deaths every year by 2050. The possibility to detect antimicrobial resistant genes and bacteria in environmental and biological samples is crucial for the detection and monitoring of AMR, as well as to identify effective strategies. To this aim, a promising approach consists in the combination of high-throughput technologies (e.g. shotgun sequencing) with bioinformatics and Machine Learning. However, the high complexity of real metagenomic samples makes the validation of the results a challenging task. In order to evaluate the capability of Machine Learning models to predict the presence of AMR in shotgun sequencing samples, we exploited a modified version of the CAMISIM simulator to generated synthetic data with different resistance profiles, starting from annotated genomes retrieved from the PATRIC database. Our approach allowed us to compare the performances of different bioinformatic and Machine Learning pipelines.
Claudia Sala, A.Z. (2022). Evaluation of Machine Learning models for the detection of Antimicrobial Resistance based on Synthetic Data.
Evaluation of Machine Learning models for the detection of Antimicrobial Resistance based on Synthetic Data
Claudia Sala
Primo
;Adriano Zaghi;Ettore Rocchi;Nicolas R. Derus;Alessandra De Cesare;Gastone CastellaniUltimo
2022
Abstract
Antimicrobial Resistance (AMR) is a global health problem which is estimated to cause ~10 million deaths every year by 2050. The possibility to detect antimicrobial resistant genes and bacteria in environmental and biological samples is crucial for the detection and monitoring of AMR, as well as to identify effective strategies. To this aim, a promising approach consists in the combination of high-throughput technologies (e.g. shotgun sequencing) with bioinformatics and Machine Learning. However, the high complexity of real metagenomic samples makes the validation of the results a challenging task. In order to evaluate the capability of Machine Learning models to predict the presence of AMR in shotgun sequencing samples, we exploited a modified version of the CAMISIM simulator to generated synthetic data with different resistance profiles, starting from annotated genomes retrieved from the PATRIC database. Our approach allowed us to compare the performances of different bioinformatic and Machine Learning pipelines.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.