Abstract In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens and is one of the largest freely available linguistic resources for English. The paper describes the tools and methodology used in the construction of the corpus and provides a qualitative evaluation of its contents, carried out through a vocabulary based comparison with the BNC. We conclude by giving practical information about availability and format of the corpus.

A. Ferraresi, E. Zanchetta, M. Baroni, S. Bernardini (2008). Introducing and evaluating ukWaC, a very large Web-derived corpus of English. MARRAKECH : s.n.

Introducing and evaluating ukWaC, a very large Web-derived corpus of English

FERRARESI, ADRIANO;ZANCHETTA, EROS;BARONI, MARCO;BERNARDINI, SILVIA
2008

Abstract

Abstract In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens and is one of the largest freely available linguistic resources for English. The paper describes the tools and methodology used in the construction of the corpus and provides a qualitative evaluation of its contents, carried out through a vocabulary based comparison with the BNC. We conclude by giving practical information about availability and format of the corpus.
2008
Proceedings of the 4th Web as Corpus (WAC-4) "Can we beat Google?"
47
54
A. Ferraresi, E. Zanchetta, M. Baroni, S. Bernardini (2008). Introducing and evaluating ukWaC, a very large Web-derived corpus of English. MARRAKECH : s.n.
A. Ferraresi; E. Zanchetta; M. Baroni; S. Bernardini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/64955
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact