Modern natural language processing techniques have given rise to embedding techniques that can represent documents based on their content or context, and several papers have operationalized these to perform bibliometric tasks. The relationship between these embeddings and conventional citation based or title and abstract based mappings remains unclear. Contrary to citation-based or term-based relatedness, embedding-based relatedness is not immediately interpretable. We consider four embedding-derived publication relatedness measures, based on: 1) word2vec embeddings of citation labels, sentence embeddings using 2) BERT and 3) SciBERT, and 4) title and abstract embeddings using SPECTER, and compare them with conventional bibliometric publication relatedness measures derived from citation relations and title and abstract noun phrases. We show that there is stronger overlap between these embedding-derived relatedness measures and citation-based relatedness than with title and abstract noun phrase-based relatedness, and that embedding-derived relatedness measures outperform conventional techniques when used to cluster publications cited with the same citation intent.

Lamers W.S., van Eck N.J., Colavizza G. (2021). An appraisal of publication embedding techniques in the context of conventional bibliometric relatedness measures. International Society for Scientometrics and Informetrics.

An appraisal of publication embedding techniques in the context of conventional bibliometric relatedness measures

Colavizza G.
2021

Abstract

Modern natural language processing techniques have given rise to embedding techniques that can represent documents based on their content or context, and several papers have operationalized these to perform bibliometric tasks. The relationship between these embeddings and conventional citation based or title and abstract based mappings remains unclear. Contrary to citation-based or term-based relatedness, embedding-based relatedness is not immediately interpretable. We consider four embedding-derived publication relatedness measures, based on: 1) word2vec embeddings of citation labels, sentence embeddings using 2) BERT and 3) SciBERT, and 4) title and abstract embeddings using SPECTER, and compare them with conventional bibliometric publication relatedness measures derived from citation relations and title and abstract noun phrases. We show that there is stronger overlap between these embedding-derived relatedness measures and citation-based relatedness than with title and abstract noun phrase-based relatedness, and that embedding-derived relatedness measures outperform conventional techniques when used to cluster publications cited with the same citation intent.
2021
18th International Conference on Scientometrics and Informetrics, ISSI 2021
633
638
Lamers W.S., van Eck N.J., Colavizza G. (2021). An appraisal of publication embedding techniques in the context of conventional bibliometric relatedness measures. International Society for Scientometrics and Informetrics.
Lamers W.S.; van Eck N.J.; Colavizza G.
File in questo prodotto:
File Dimensione Formato  
Colavizza An appraisal of publication embedding techniques.pdf

accesso aperto

Descrizione: Contributo in Atti di Convegno
Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 5.49 MB
Formato Adobe PDF
5.49 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/948699
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact