CRIS Current Research Information System

This paper introduces the BootCaT toolkit, a suite of perl programs implementing an iterative procedure to bootstrap specialized corpora and terms from the web. The procedure requires only a small set of seed terms as input. The seeds are used to build a corpus via automated Google queries, and more terms are extracted from this corpus. In turn, these new terms are used as seeds to build a larger corpus via automated queries, and so forth. The corpus and the unigram terms are then used to extract multi-word terms. An experimental evaluation of the tools applied to the construction of English and Italian corpora and term lists from the domain of psychiatry illustrates their potential usefulness.

Baroni M., Bernardini S. (2004). BootCaT: Bootstrapping corpora and terms from the web. LISBON : ELDA.

BootCaT: Bootstrapping corpora and terms from the web

BARONI, MARCO;BERNARDINI, SILVIA

2004

Abstract

This paper introduces the BootCaT toolkit, a suite of perl programs implementing an iterative procedure to bootstrap specialized corpora and terms from the web. The procedure requires only a small set of seed terms as input. The seeds are used to build a corpus via automated Google queries, and more terms are extracted from this corpus. In turn, these new terms are used as seeds to build a larger corpus via automated queries, and so forth. The corpus and the unigram terms are then used to extract multi-word terms. An experimental evaluation of the tools applied to the construction of English and Italian corpora and term lists from the domain of psychiatry illustrates their potential usefulness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2004
			
	Titolo del volume
	
				Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC 2004.
			
	Pagina iniziale
	
				1313
			
	Pagina finale
	
				1316
			
	Citazione
	
				Baroni M.,  Bernardini S. (2004). BootCaT: Bootstrapping corpora and terms from the web. LISBON : ELDA.
			
	Tutti gli autori
	
						Baroni M.; Bernardini S.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/4929

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

277

ND

ND

social impact