Temporal features of text have been shown to improve clustering and organization of documents, text classification, visualization, and ranking. Temporal ranking models consider the temporal expressions found in text (e.g., “in 2021” or “last year”) as time units, rather than as keywords, to define a temporal relevance and improve ranking. This paper introduces a new class of ranking models called Temporal Metric Space Models (TMSM), based on a new domain for representing temporal information found in documents and queries, where each temporal expression is represented as a time interval. Furthermore, we introduce a new frequency-based baseline called Temporal BM25 (TBM25). We evaluate the effectiveness of each proposed metric against a purely textual baseline, as well as several variations of the metrics themselves, where we change the aggregate function, the time granularity and the combination weight. Our extensive experiments on five test collections show statistically significant improvements of TMSM and TBM25 over state-of-the-art temporal ranking models. Combining the temporal similarity scores with the text similarity scores always improves the results, when the combination weight is between 2% and 6% for the temporal scores. This is true also for test collections where only 5% of queries contain explicit temporal expressions.

Ranking Models for the Temporal Dimension of Text / Rizzo, Stefano Giovanni; Brucato, Matteo; Montesi, Danilo. - In: ACM TRANSACTIONS ON INFORMATION SYSTEMS. - ISSN 1046-8188. - ELETTRONICO. - 41:2(2022), pp. 49.1-49.34. [10.1145/3565481]

Ranking Models for the Temporal Dimension of Text

Rizzo, Stefano Giovanni;Montesi, Danilo
2022

Abstract

Temporal features of text have been shown to improve clustering and organization of documents, text classification, visualization, and ranking. Temporal ranking models consider the temporal expressions found in text (e.g., “in 2021” or “last year”) as time units, rather than as keywords, to define a temporal relevance and improve ranking. This paper introduces a new class of ranking models called Temporal Metric Space Models (TMSM), based on a new domain for representing temporal information found in documents and queries, where each temporal expression is represented as a time interval. Furthermore, we introduce a new frequency-based baseline called Temporal BM25 (TBM25). We evaluate the effectiveness of each proposed metric against a purely textual baseline, as well as several variations of the metrics themselves, where we change the aggregate function, the time granularity and the combination weight. Our extensive experiments on five test collections show statistically significant improvements of TMSM and TBM25 over state-of-the-art temporal ranking models. Combining the temporal similarity scores with the text similarity scores always improves the results, when the combination weight is between 2% and 6% for the temporal scores. This is true also for test collections where only 5% of queries contain explicit temporal expressions.
2022
Ranking Models for the Temporal Dimension of Text / Rizzo, Stefano Giovanni; Brucato, Matteo; Montesi, Danilo. - In: ACM TRANSACTIONS ON INFORMATION SYSTEMS. - ISSN 1046-8188. - ELETTRONICO. - 41:2(2022), pp. 49.1-49.34. [10.1145/3565481]
Rizzo, Stefano Giovanni; Brucato, Matteo; Montesi, Danilo
File in questo prodotto:
File Dimensione Formato  
Ranking_Models_for_the_Temporal_Dimension_of_Text__Final_.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.37 MB
Formato Adobe PDF
1.37 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/895602
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact