Evaluating Multilingual Sentence Representation Models in a Real Case Scenario

Tripodi, Rocco; Rexhina, Blloshmi; Levis Sullam Simon,

In this paper, we present an evaluation of sentence representation models on the paraphrase detection task. The evaluation is designed to simulate a real-world problem of plagiarism and is based on one of the most important cases of forgery in modern history: the so-called {``}Protocols of the Elders of Zion{''}. The sentence pairs for the evaluation are taken from the infamous forged text {``}Protocols of the Elders of Zion{''} (Protocols) by unknown authors; and by {``}Dialogue in Hell between Machiavelli and Montesquieu{''} by Maurice Joly. Scholars have demonstrated that the first text plagiarizes from the second, indicating all the forged parts on qualitative grounds. Following this evidence, we organized the rephrased texts and asked native speakers to quantify the level of similarity between each pair. We used this material to evaluate sentence representation models in two languages: English and French, and on three tasks: similarity correlation, paraphrase identification, and paraphrase retrieval. Our evaluation aims at encouraging the development of benchmarks based on real-world problems, as a means to prevent problems connected to AI hypes, and to use NLP technologies for social good. Through our evaluation, we are able to confirm that the infamous Protocols are actually a plagiarized text but, as we will show, we encounter several problems connected with the convoluted nature of the task, that is very different from the one reported in standard benchmarks of paraphrase detection and sentence similarity. Code and data available at https://github.com/roccotrip/protocols.

Tripodi Rocco, B.R. (2022). Evaluating Multilingual Sentence Representation Models in a Real Case Scenario. European Language Resources Association.