Autochthonous pig breeds are important reservoirs of genetic variability that can be exploited to identify alleles that may confer adaptation to peculiar production environments and climatic conditions. Genetic integrity of these breeds is a matter of concern of many conservation programs in European countries. High-throughput genotyping of single nucleotide polymorphisms (SNPs) is an efficient approach to capture the differences among breeds and then evaluate optimal management strategies for the conservation of this diversity. Several classification methods based on feature selection are available for SNP data but their implementation is often limited by computational resources. To overcome the computational bottleneck, the selection of a small number of genetic features has been proposed even if this approach can lose some biologically relevant elements. The aim of this work was to explore different marker panels to identify the proper balance between these aspects (i.e. computational burden and selection of informative SNPs) and to find good classification methods to differentiate several European autochthonous pig breeds. As Random Forests (RF) provided interesting results in previous works, Boruta algorithm (a RF wrapper method) was tested to select the more stable features (i.e. SNPs) after a series of iterations. In this work, Boruta algorithm was applied on a total of 1154 pigs from 23 European breeds (48 per breed), genotyped with the GGP 70k Porcine array. Quality check and filtering were performed with PLINK1.9, which retained 16107 SNPs for feature selection. Analyses were carried out with randomForest and boruta R packages. Different numbers of iterations, from a minimum of 1000 to a maximum of 100000, were tested and markers labelled as "Confirmed" with Boruta analyses were chosen. Several subsets of SNPs have been obtained by recursively splitting the panel of Confirmed SNPs. Goodness of prediction was evaluated with the Out of Bag (OOB) error. A total of 2471 features was initially labelled as stable. The higher number of Confirmed SNPs (193) was on porcine chromosome (SSC) 8, while the minimum (41) was on SSC1. The smaller subset to obtain an OOB error < 1% included 171 SNPs. Annotation of genomic regions encompassing selected SNPs was carried out with Ensembl Biomart. According to the gene function in these regions, the selected markers may reflect the differences among breeds originated by natural or artificial selection. Acknowledgements: This work has received funding from the University of Bologna RFO programs and from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 634476 for the project with acronym TREASURE.

Giuseppina Schiavo, S.B. (2022). Boruta algorithm implementation for the classification and allocation of European pig breeds based on high density single nucleotide polymorphisms..

Boruta algorithm implementation for the classification and allocation of European pig breeds based on high density single nucleotide polymorphisms.

Giuseppina Schiavo;Samuele Bovo;Anisa Ribani;Valeria Taurisano;Luca Fontanesi
2022

Abstract

Autochthonous pig breeds are important reservoirs of genetic variability that can be exploited to identify alleles that may confer adaptation to peculiar production environments and climatic conditions. Genetic integrity of these breeds is a matter of concern of many conservation programs in European countries. High-throughput genotyping of single nucleotide polymorphisms (SNPs) is an efficient approach to capture the differences among breeds and then evaluate optimal management strategies for the conservation of this diversity. Several classification methods based on feature selection are available for SNP data but their implementation is often limited by computational resources. To overcome the computational bottleneck, the selection of a small number of genetic features has been proposed even if this approach can lose some biologically relevant elements. The aim of this work was to explore different marker panels to identify the proper balance between these aspects (i.e. computational burden and selection of informative SNPs) and to find good classification methods to differentiate several European autochthonous pig breeds. As Random Forests (RF) provided interesting results in previous works, Boruta algorithm (a RF wrapper method) was tested to select the more stable features (i.e. SNPs) after a series of iterations. In this work, Boruta algorithm was applied on a total of 1154 pigs from 23 European breeds (48 per breed), genotyped with the GGP 70k Porcine array. Quality check and filtering were performed with PLINK1.9, which retained 16107 SNPs for feature selection. Analyses were carried out with randomForest and boruta R packages. Different numbers of iterations, from a minimum of 1000 to a maximum of 100000, were tested and markers labelled as "Confirmed" with Boruta analyses were chosen. Several subsets of SNPs have been obtained by recursively splitting the panel of Confirmed SNPs. Goodness of prediction was evaluated with the Out of Bag (OOB) error. A total of 2471 features was initially labelled as stable. The higher number of Confirmed SNPs (193) was on porcine chromosome (SSC) 8, while the minimum (41) was on SSC1. The smaller subset to obtain an OOB error < 1% included 171 SNPs. Annotation of genomic regions encompassing selected SNPs was carried out with Ensembl Biomart. According to the gene function in these regions, the selected markers may reflect the differences among breeds originated by natural or artificial selection. Acknowledgements: This work has received funding from the University of Bologna RFO programs and from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 634476 for the project with acronym TREASURE.
2022
Book of Abstracts of the XI. International Symposium on the Mediterranean Pig
27
28
Giuseppina Schiavo, S.B. (2022). Boruta algorithm implementation for the classification and allocation of European pig breeds based on high density single nucleotide polymorphisms..
Giuseppina Schiavo, Samuele Bovo, Anisa Ribani, Valeria Taurisano, Maria Muñoz, Cristina Óvilo, TREASURE Consortium, Maurizio Gallo, Luca Fontanesi...espandi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/996757
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact