CRIS Current Research Information System

The literature on coronaviruses counts more than 300,000 publications. Finding relevant papers concerning arbitrary queries is essential to discovery helpful knowledge. Current best information retrieval (IR) use deep learning approaches and need supervised train sets with labeled data, namely to know a priori the queries and their corresponding relevant papers. Creating such labeled datasets is time-expensive and requires prominent experts’ efforts, resources insufficiently available under a pandemic time pressure. We present a new self-supervised solution, called SUBLIMER, that does not require labels to learn to search on corpora of scientific papers for most relevant against arbitrary queries. SUBLIMER is a novel efficient IR engine trained on the unsupervised COVID-19 Open Research Dataset (CORD19), using deep metric learning. The core point of our self-supervised approach is that it uses no labels, but exploits the bibliography citations from papers to create a latent space where their spatial proximity is a metric of semantic similarity; for this reason, it can also be applied to other domains of papers corpora. SUBLIMER, despite is self-supervised, outperforms the Precision@5 (P@5) and Bpref of the state-of-the-art competitors on CORD19, which, differently from our approach, require both labeled datasets and a number of trainable parameters that is an order of magnitude higher than our.

Moro G., Valgimigli L. (2021). Efficient self-supervised metric information retrieval: A bibliography based method applied to covid literature. SENSORS, 21(19), 1-20 [10.3390/s21196430].

Efficient self-supervised metric information retrieval: A bibliography based method applied to covid literature

Moro G.^Primo;Valgimigli L.^Secondo

2021

Abstract

The literature on coronaviruses counts more than 300,000 publications. Finding relevant papers concerning arbitrary queries is essential to discovery helpful knowledge. Current best information retrieval (IR) use deep learning approaches and need supervised train sets with labeled data, namely to know a priori the queries and their corresponding relevant papers. Creating such labeled datasets is time-expensive and requires prominent experts’ efforts, resources insufficiently available under a pandemic time pressure. We present a new self-supervised solution, called SUBLIMER, that does not require labels to learn to search on corpora of scientific papers for most relevant against arbitrary queries. SUBLIMER is a novel efficient IR engine trained on the unsupervised COVID-19 Open Research Dataset (CORD19), using deep metric learning. The core point of our self-supervised approach is that it uses no labels, but exploits the bibliography citations from papers to create a latent space where their spatial proximity is a metric of semantic similarity; for this reason, it can also be applied to other domains of papers corpora. SUBLIMER, despite is self-supervised, outperforms the Precision@5 (P@5) and Bpref of the state-of-the-art competitors on CORD19, which, differently from our approach, require both labeled datasets and a number of trainable parameters that is an order of magnitude higher than our.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista
	
				SENSORS
			
	Codice DOI
	
				https://dx.doi.org/10.3390/s21196430
			
	Citazione
	
				Moro G.,  Valgimigli L. (2021). Efficient self-supervised metric information retrieval: A bibliography based method applied to covid literature. SENSORS, 21(19), 1-20 [10.3390/s21196430].
			
	Tutti gli autori
	
						Moro G.; Valgimigli L.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
sensors-21-06430-v2.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 474.42 kB Formato Adobe PDF Visualizza/Apri	474.42 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/850570

Citazioni

4

28

24

30

social impact