We propose to develop a codebase to enrich Wikidata with citations to scholarly publications (journal articles and books) that are currently referenced in English Wikipedia. This codebase will build on top of previous work, such as the wikiciteparser, and integrates new components, notably: i) a classifier to distinguish citations by cited source (books, journal articles and other online contents); ii) a look-up module to equip citations with identifiers from Crossref or other APIs. In so doing, Wikipedia Citations extends upon prior work which only focused on citations already equipped with identifiers, such as mwcites. Our goal is to develop four software modules in Python (the codebase from now on) that can be easily reused by developers in the Wikidata community: [extractor] a module to extract citation and bibliographic information from articles in the English Wikipedia; [converter] a module to convert extracted information into a CSV-based format compliant with a shareable bibliographic data model, e.g., the OpenCitations Data Model; [enricher] a module for reconciling bibliographic resources and people (obtained in step 2) with entities available in Wikidata via their persistent identifiers (primarily DOIs, QIDs, ORCIDs, VIAFs, then also persons, places and organisations if time allows); [pusher] a module to disambiguate, deduplicate, and load citation and bibliographic data in Wikidata that reuses code already developed by the wikidata community as much as possible.

Wikipedia Citations in Wikidata

Silvio Peroni
In corso di stampa

Abstract

We propose to develop a codebase to enrich Wikidata with citations to scholarly publications (journal articles and books) that are currently referenced in English Wikipedia. This codebase will build on top of previous work, such as the wikiciteparser, and integrates new components, notably: i) a classifier to distinguish citations by cited source (books, journal articles and other online contents); ii) a look-up module to equip citations with identifiers from Crossref or other APIs. In so doing, Wikipedia Citations extends upon prior work which only focused on citations already equipped with identifiers, such as mwcites. Our goal is to develop four software modules in Python (the codebase from now on) that can be easily reused by developers in the Wikidata community: [extractor] a module to extract citation and bibliographic information from articles in the English Wikipedia; [converter] a module to convert extracted information into a CSV-based format compliant with a shareable bibliographic data model, e.g., the OpenCitations Data Model; [enricher] a module for reconciling bibliographic resources and people (obtained in step 2) with entities available in Wikidata via their persistent identifiers (primarily DOIs, QIDs, ORCIDs, VIAFs, then also persons, places and organisations if time allows); [pusher] a module to disambiguate, deduplicate, and load citation and bibliographic data in Wikidata that reuses code already developed by the wikidata community as much as possible.
2021
Silvio Peroni
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/810021
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact