Objectives: To assess the performance of machine learning (ML) algorithms to predict the presence of germline BRCA1/2 pathogenic variants in ovarian cancer (OC) patients based on clinical-pathological features. Methods: Clinical-pathological features of 648 patients with OC tested for BRCA1/2 were analysed using three supervised ML algorithms: random forest, boosting and support vector machine. Results: In the 'test' sample, boosting proved to be the most effective algorithm (accuracy: 84.5%; precision: 80.0%; recall: 3.1%; area under the curve (AUC): 78.8%), followed by support vector machine (accuracy: 81.4%; precision: 72.7%; recall: 27.6%; AUC: 62.3%) and random forest (accuracy: 74.4%; precision: 55.6%; recall: 14.7%; AUC: 71.3%). In the 'validation' sample, accuracy was 79.8% for boosting, 81.7% for support vector machine, 80.8% for random forest.In the most effective algorithm (boosting), family history of OC showed the highest relative influence (52.9), followed by histotype (19.5), personal history of breast cancer (BC) (17.1), age at diagnosis (8.4) and family history of BC (2.2), while Federation of Gynecology and Obstetrics stage had no influence. Discussion: We identified the predictive algorithm that best estimates the a priori likelihood of being a carrier of germline BRCA1/2 pathogenic variants in patients with OC. These findings support a role for ML approaches in predicting BRCA1/2 status in patients with OC, but accuracy and precision are still suboptimal for clinical use, suggesting the need for additional research. Conclusions: Results support the selection of relevant clinical features for predictive purposes, which could have significant implications for the clinical management of patients with OC.
Innella, G., Erini, G., De Leo, A., Godino, L., Caramanna, L., Ferrari, S., et al. (2025). Machine learning prediction of germline BRCA1/2 pathogenic variants in patients with ovarian cancer. BMJ HEALTH & CARE INFORMATICS, 32(1), 1-5 [10.1136/bmjhci-2025-101751].
Machine learning prediction of germline BRCA1/2 pathogenic variants in patients with ovarian cancer
Innella, Giovanni;Erini, Giulia;Godino, Lea;Caramanna, Luca;Miccoli, Sara;Perrone, Anna Myriam;De Iaco, Pierandrea;Turchetti, Daniela;Rucci, Paola
2025
Abstract
Objectives: To assess the performance of machine learning (ML) algorithms to predict the presence of germline BRCA1/2 pathogenic variants in ovarian cancer (OC) patients based on clinical-pathological features. Methods: Clinical-pathological features of 648 patients with OC tested for BRCA1/2 were analysed using three supervised ML algorithms: random forest, boosting and support vector machine. Results: In the 'test' sample, boosting proved to be the most effective algorithm (accuracy: 84.5%; precision: 80.0%; recall: 3.1%; area under the curve (AUC): 78.8%), followed by support vector machine (accuracy: 81.4%; precision: 72.7%; recall: 27.6%; AUC: 62.3%) and random forest (accuracy: 74.4%; precision: 55.6%; recall: 14.7%; AUC: 71.3%). In the 'validation' sample, accuracy was 79.8% for boosting, 81.7% for support vector machine, 80.8% for random forest.In the most effective algorithm (boosting), family history of OC showed the highest relative influence (52.9), followed by histotype (19.5), personal history of breast cancer (BC) (17.1), age at diagnosis (8.4) and family history of BC (2.2), while Federation of Gynecology and Obstetrics stage had no influence. Discussion: We identified the predictive algorithm that best estimates the a priori likelihood of being a carrier of germline BRCA1/2 pathogenic variants in patients with OC. These findings support a role for ML approaches in predicting BRCA1/2 status in patients with OC, but accuracy and precision are still suboptimal for clinical use, suggesting the need for additional research. Conclusions: Results support the selection of relevant clinical features for predictive purposes, which could have significant implications for the clinical management of patients with OC.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


