MultiSemCor is an English/Italian parallel corpus, aligned at the word level and annotated with PoS, lemma and word sense. The parallel corpus has been created by exploiting the SemCor corpus, in which content words are lemmatized and sensetagged with reference to the WordNet lexical database. The main hypothesis underlying this methodology is that, given a text and its translation into another language, the semantic information is mostly preserved during the translation process. Therefore, if the texts in one language have been semantically annotated and their translations have not, annotations can be transferred from the source language to the target using word alignment as a bridge.
MultiSemCor: an English/Italian aligned corpus word-annotated with a shared inventory of senses
Bentivogli Luisa;Ranieri Marcello
2005
Abstract
MultiSemCor is an English/Italian parallel corpus, aligned at the word level and annotated with PoS, lemma and word sense. The parallel corpus has been created by exploiting the SemCor corpus, in which content words are lemmatized and sensetagged with reference to the WordNet lexical database. The main hypothesis underlying this methodology is that, given a text and its translation into another language, the semantic information is mostly preserved during the translation process. Therefore, if the texts in one language have been semantically annotated and their translations have not, annotations can be transferred from the source language to the target using word alignment as a bridge.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.