The increasing demand for linguistic resources consisting of substantial amounts of data, such as large corpora, presents the challenge of building computational infrastructures capable of handling unprecedented amounts of information. One possible solution is the sharing of high-level, linguistically motivated and carefully balanced corpora for building one large language resource accessible worldwide. The most feasible way of integrating such widely distributed resources seems to be the construction of an infrastructure to connect various sites by interfacing local presentation formats, access methods and policies in a global network to automatically manage access procedures to widely distributed and diversified materials. Grid computing systems are designed to meet these requirements. This paper presents work in progress on an experiment for building a distributed corpus structure prototype. A small web portal was designed to perform global queries in the distributed corpus and collect the results of the same query applied to each local corpus forming part of the grid. Moreover, other computational services such as an online POS tagger and a morphological analyser/generator were inserted into the Grid to show the feasibility of such scenario.

Building Distributed Language Resources by Grid Computing

TAMBURINI, FABIO
2004

Abstract

The increasing demand for linguistic resources consisting of substantial amounts of data, such as large corpora, presents the challenge of building computational infrastructures capable of handling unprecedented amounts of information. One possible solution is the sharing of high-level, linguistically motivated and carefully balanced corpora for building one large language resource accessible worldwide. The most feasible way of integrating such widely distributed resources seems to be the construction of an infrastructure to connect various sites by interfacing local presentation formats, access methods and policies in a global network to automatically manage access procedures to widely distributed and diversified materials. Grid computing systems are designed to meet these requirements. This paper presents work in progress on an experiment for building a distributed corpus structure prototype. A small web portal was designed to perform global queries in the distributed corpus and collect the results of the same query applied to each local corpus forming part of the grid. Moreover, other computational services such as an online POS tagger and a morphological analyser/generator were inserted into the Grid to show the feasibility of such scenario.
Proceedings of the 4th International Conference on Language Resources and Evaluation - LREC 2004
1217
1220
Tamburini F.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/4919
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact