Antibiotic resistance poses a significant challenge in modern medicine, with current tests requiring 24-48 hours to determine susceptibility. In critical cases, this delay can be life-threatening. Therefore, a rapid and accurate method for predicting antibiotic resistance is essential. This purpose of this study is the development and comparison of several machine learning/deep learning models to predict antibiotic resistance or susceptibility using epidemiological data. While most studies focus on urinary tract infections (UTIs), or all infections, due to the availability of large datasets, few address specifically bloodstream infections despite their clinical importance. Models were trained using hospital data acquired from the Bologna metropolitan area, for patients with positive blood cultures between January 2024 and December 2024. The following pathogen-antibiotic combinations were considered for this study: S.aureum-Oxacillin, E.faecium-Vancomycin, E.coli-Cefotaxime, Ceftazidime and K.pneumoniae-Cefotaxime, Meropenem. Input features include patients demographics (age, sex), date and hour of the day the blood culture was taken, identified species, antibiotic used and AMR rates in the different hospitals. There are several data types that were shown to be informative, such as previous antibiotic exposure and past medical history, that were not available for this study. One-hot encoding was applied to the categorical data. Predicted results were binary, i.e susceptible or resistant. Models were built using Python and the sklearn and PyTorch libraries. The employed models were logistic regression (LR), random forest (RF) and XGBoost (XGB). Deep learning models were Multi Layer Perceptron (MLP). The logistic regression model yielded Area Under the Receiver Operating Characteristic Curve (ROC AUC) scores ranging from 0.60 to 0.73. Slightly better results were obtained with the Random Forest Classifier, with ROC-AUC scores between 0.62 and 0.82. The XGBoost model demonstrated a balance between the two, with ROC-AUC scores spanning 0.65 to 0.78. Hyperparameter optimization and Cross-Validation was performed on all models. At the time of writing, preliminary results for the Multi-Layer perceptron are not available as it is still undergoing training and testing. The predictive value of the machine learning models is modest, with values for the ROC-AUC ranging from 0.60 to 0.82, depending on the pathogen-antibiotic combination used. These values are similar to previous results found in literature. Results from the Multi Layer Perceptron are underway. At the moment of writing, dimentionality reduction techniques were not yet applied to the data. As for the future direction of this project, there are plans of applying dimentionality reduction techniques to the data, which could further improve the model performance and its interpretability.
Cetatean, R., Ambretti, S. (2025). Using Machine Learning for predicting antibiotic resistance in bloodsteam infections.
Using Machine Learning for predicting antibiotic resistance in bloodsteam infections
Cetatean Raul;Ambretti Simone
2025
Abstract
Antibiotic resistance poses a significant challenge in modern medicine, with current tests requiring 24-48 hours to determine susceptibility. In critical cases, this delay can be life-threatening. Therefore, a rapid and accurate method for predicting antibiotic resistance is essential. This purpose of this study is the development and comparison of several machine learning/deep learning models to predict antibiotic resistance or susceptibility using epidemiological data. While most studies focus on urinary tract infections (UTIs), or all infections, due to the availability of large datasets, few address specifically bloodstream infections despite their clinical importance. Models were trained using hospital data acquired from the Bologna metropolitan area, for patients with positive blood cultures between January 2024 and December 2024. The following pathogen-antibiotic combinations were considered for this study: S.aureum-Oxacillin, E.faecium-Vancomycin, E.coli-Cefotaxime, Ceftazidime and K.pneumoniae-Cefotaxime, Meropenem. Input features include patients demographics (age, sex), date and hour of the day the blood culture was taken, identified species, antibiotic used and AMR rates in the different hospitals. There are several data types that were shown to be informative, such as previous antibiotic exposure and past medical history, that were not available for this study. One-hot encoding was applied to the categorical data. Predicted results were binary, i.e susceptible or resistant. Models were built using Python and the sklearn and PyTorch libraries. The employed models were logistic regression (LR), random forest (RF) and XGBoost (XGB). Deep learning models were Multi Layer Perceptron (MLP). The logistic regression model yielded Area Under the Receiver Operating Characteristic Curve (ROC AUC) scores ranging from 0.60 to 0.73. Slightly better results were obtained with the Random Forest Classifier, with ROC-AUC scores between 0.62 and 0.82. The XGBoost model demonstrated a balance between the two, with ROC-AUC scores spanning 0.65 to 0.78. Hyperparameter optimization and Cross-Validation was performed on all models. At the time of writing, preliminary results for the Multi-Layer perceptron are not available as it is still undergoing training and testing. The predictive value of the machine learning models is modest, with values for the ROC-AUC ranging from 0.60 to 0.82, depending on the pathogen-antibiotic combination used. These values are similar to previous results found in literature. Results from the Multi Layer Perceptron are underway. At the moment of writing, dimentionality reduction techniques were not yet applied to the data. As for the future direction of this project, there are plans of applying dimentionality reduction techniques to the data, which could further improve the model performance and its interpretability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


