Building Distributed Language Resources by Grid Computing

Tamburini, Fabio

The increasing demand for linguistic resources consisting of substantial amounts of data, such as large corpora, presents the challenge of building computational infrastructures capable of handling unprecedented amounts of information. One possible solution is the sharing of high-level, linguistically motivated and carefully balanced corpora for building one large language resource accessible worldwide. The most feasible way of integrating such widely distributed resources seems to be the construction of an infrastructure to connect various sites by interfacing local presentation formats, access methods and policies in a global network to automatically manage access procedures to widely distributed and diversified materials. Grid computing systems are designed to meet these requirements. This paper presents work in progress on an experiment for building a distributed corpus structure prototype. A small web portal was designed to perform global queries in the distributed corpus and collect the results of the same query applied to each local corpus forming part of the grid. Moreover, other computational services such as an online POS tagger and a morphological analyser/generator were inserted into the Grid to show the feasibility of such scenario.