Biodiversity databases provide unprecedented opportunities for the use of species occurrence data for the development of large scale biodiversity analyses. However, these records often contain taxonomic uncertainties that can ultimately affect the outcomes of downstream analyses. Although several tools have been developed to address these issues, there is limited guidance on how to efficiently use and integrate them. Here, we present a reproducible workflow for handling vascular plant occurrence data, and provide the first comparative analysis of R packages for the taxonomic harmonisation of vascular plant names. Our goal is to assess the differences in performance across the tested tools and to highlight best practices for leveraging large biodiversity databases. We first downloaded occurrence data for vascular plants in Italy from the Botanical Information and Ecology Network (BIEN) and Global Biodiversity Information Facility (GBIF). We then compared seven R packages for taxonomic harmonisation, evaluating their ability to resolve names to accepted taxa and their overall performance. Our results highlight heterogeneity in the number of names resolved by the different tools, with packages relying on plant-specific databases and implementing fuzzy matching outperforming those based on generalist databases and with no possibility of fuzzy matching. These findings underscore that the choice of both packages and taxonomic authorities can have a strong influence on data cleaning outcomes.

Santovito, D., Chiarucci, A., Rocchini, D., Santi, F., Cortès Lobos, R.B., Testolin, R. (2026). Bridging biodiversity gaps: Assessing R tools for harmonising vascular plant records. ECOLOGICAL INFORMATICS, 93(103543), 1-10 [10.1016/j.ecoinf.2025.103543].

Bridging biodiversity gaps: Assessing R tools for harmonising vascular plant records

Santovito, Diletta;Chiarucci, Alessandro;Rocchini, Duccio;Santi, Francesco;Testolin, Riccardo
2026

Abstract

Biodiversity databases provide unprecedented opportunities for the use of species occurrence data for the development of large scale biodiversity analyses. However, these records often contain taxonomic uncertainties that can ultimately affect the outcomes of downstream analyses. Although several tools have been developed to address these issues, there is limited guidance on how to efficiently use and integrate them. Here, we present a reproducible workflow for handling vascular plant occurrence data, and provide the first comparative analysis of R packages for the taxonomic harmonisation of vascular plant names. Our goal is to assess the differences in performance across the tested tools and to highlight best practices for leveraging large biodiversity databases. We first downloaded occurrence data for vascular plants in Italy from the Botanical Information and Ecology Network (BIEN) and Global Biodiversity Information Facility (GBIF). We then compared seven R packages for taxonomic harmonisation, evaluating their ability to resolve names to accepted taxa and their overall performance. Our results highlight heterogeneity in the number of names resolved by the different tools, with packages relying on plant-specific databases and implementing fuzzy matching outperforming those based on generalist databases and with no possibility of fuzzy matching. These findings underscore that the choice of both packages and taxonomic authorities can have a strong influence on data cleaning outcomes.
2026
Santovito, D., Chiarucci, A., Rocchini, D., Santi, F., Cortès Lobos, R.B., Testolin, R. (2026). Bridging biodiversity gaps: Assessing R tools for harmonising vascular plant records. ECOLOGICAL INFORMATICS, 93(103543), 1-10 [10.1016/j.ecoinf.2025.103543].
Santovito, Diletta; Chiarucci, Alessandro; Rocchini, Duccio; Santi, Francesco; Cortès Lobos, Rocìo Beatriz; Testolin, Riccardo
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1033019
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact