The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish. © Springer-Verlag Berlin Heidelberg 2009.

An improved automatic term recognition method for spanish

Barron-Cedeno A.;Drouin P.;
2009

Abstract

The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish. © Springer-Verlag Berlin Heidelberg 2009.
2009
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
125
136
Barron-Cedeno A.; Sierra G.; Drouin P.; Ananiadou S.
File in questo prodotto:
File Dimensione Formato  
Barrón-Cedeño2009_Chapter_AnImprovedAutomaticTermRecogni.pdf

accesso riservato

Tipo: Versione (PDF) editoriale
Licenza: Licenza per accesso riservato
Dimensione 194.88 kB
Formato Adobe PDF
194.88 kB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709285
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 39
  • ???jsp.display-item.citation.isi??? 28
social impact