Autochthonous pig breeds represent a valuable source of genetic diversity, whose conservation at the genomic level is crucial in management programs across Europe. High-density single nucleotide polymorphism (SNP) genotyping is a cost-effective tool for capturing genetic variation, but processing can be computationally demanding. Feature selection methods need a compromise between computational feasibility and retaining biologically relevant markers. This work presents the application of artificial intelligence approaches combining random forest and Boruta algorithm, a random forests wrapper, to address the challenge of identifying informative single nucleotide polymorphisms (SNPs) using data from over 1,100 pigs representing 10 Italian pig breeds. These pigs were genotyped with the GGP 70k Porcine SNP array. The randomForest and boruta R packages were utilised with iterations up to 100,000 to ensure robust feature selection. The Boruta algorithm identified approx. 2,000 stable SNPs, which were then split into several subsets, ranked based on the Out-of-Bag (OOB) error. Annotation of selected SNPs using Ensembl Biomart revealed genomic regions associated with genes that could play significant roles in breed differentiation and adaptation. The smallest subset of SNPs, with an OOB error of <1%, consisted of fewer than 200 SNPs. This work demonstrates the advantages of combining high-density SNP data with machine learning techniques and highlights the potential of AI-driven approaches to identify key genetic markers explaining breed specific genetic features. Funded by the European Union – NextGenerationEU under the National Recovery and Resilience Plan (PNRR) – Mission 4 Education and research – Component 2 From research to business – Investment 1.1 Notice Prin 2022 – DD N. 104 del 2/2/2022, proposal code 202238NP9N – CUP J53D23009570001.

Schiavo, G., Bovo, S., Bertolini, F., Bolner, M., Ribani, A., Taurisano, V., et al. (2025). Application of artificial intelligence in livestock genomics: combining random forest and Boruta algorithm to identify informative single nucleotide polymorphisms across pig breeds.

Application of artificial intelligence in livestock genomics: combining random forest and Boruta algorithm to identify informative single nucleotide polymorphisms across pig breeds

G. Schiavo
;
S. Bovo;F. Bertolini;M. Bolner;A. Ribani;V. Taurisano;G. Galimberti;L. Fontanesi
2025

Abstract

Autochthonous pig breeds represent a valuable source of genetic diversity, whose conservation at the genomic level is crucial in management programs across Europe. High-density single nucleotide polymorphism (SNP) genotyping is a cost-effective tool for capturing genetic variation, but processing can be computationally demanding. Feature selection methods need a compromise between computational feasibility and retaining biologically relevant markers. This work presents the application of artificial intelligence approaches combining random forest and Boruta algorithm, a random forests wrapper, to address the challenge of identifying informative single nucleotide polymorphisms (SNPs) using data from over 1,100 pigs representing 10 Italian pig breeds. These pigs were genotyped with the GGP 70k Porcine SNP array. The randomForest and boruta R packages were utilised with iterations up to 100,000 to ensure robust feature selection. The Boruta algorithm identified approx. 2,000 stable SNPs, which were then split into several subsets, ranked based on the Out-of-Bag (OOB) error. Annotation of selected SNPs using Ensembl Biomart revealed genomic regions associated with genes that could play significant roles in breed differentiation and adaptation. The smallest subset of SNPs, with an OOB error of <1%, consisted of fewer than 200 SNPs. This work demonstrates the advantages of combining high-density SNP data with machine learning techniques and highlights the potential of AI-driven approaches to identify key genetic markers explaining breed specific genetic features. Funded by the European Union – NextGenerationEU under the National Recovery and Resilience Plan (PNRR) – Mission 4 Education and research – Component 2 From research to business – Investment 1.1 Notice Prin 2022 – DD N. 104 del 2/2/2022, proposal code 202238NP9N – CUP J53D23009570001.
2025
Book of Abstracts of the 1st EAAP Conference on Artificial Intelligence 4 Animal Science
30
30
Schiavo, G., Bovo, S., Bertolini, F., Bolner, M., Ribani, A., Taurisano, V., et al. (2025). Application of artificial intelligence in livestock genomics: combining random forest and Boruta algorithm to identify informative single nucleotide polymorphisms across pig breeds.
Schiavo, G.; Bovo, S.; Bertolini, F.; Bolner, M.; Ribani, A.; Taurisano, V.; Galimberti, G.; Gallo, M.; Fontanesi, L.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1017559
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact