The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

English-Spanish large statistical dictionary of inflectional forms / Sidorov G.; Barron-Cedeno A.; Rosso P.. - ELETTRONICO. - (2010), pp. 277-281. (Intervento presentato al convegno 7th International Conference on Language Resources and Evaluation, LREC 2010 tenutosi a Mediterranean Conference Centre, mlt nel 2010).

English-Spanish large statistical dictionary of inflectional forms

Barron-Cedeno A.;
2010

Abstract

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.
2010
Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
277
281
English-Spanish large statistical dictionary of inflectional forms / Sidorov G.; Barron-Cedeno A.; Rosso P.. - ELETTRONICO. - (2010), pp. 277-281. (Intervento presentato al convegno 7th International Conference on Language Resources and Evaluation, LREC 2010 tenutosi a Mediterranean Conference Centre, mlt nel 2010).
Sidorov G.; Barron-Cedeno A.; Rosso P.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709308
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact