Creating a treebank, annotating each sentence with its syntactic structure, is certainly a time-consuming and error prone task. For these reasons, treebanks often require maintenance and revisions to correct mistakes or to adapt it to different needs. In big projects, such as the Universal Dependencies (http://universaldependencies.org/) project, guidelines updates due to new language addition, change in theoretical approaches of a specific phenomenon management, mistakes or other changes often require specific tools to automate, at the maximum possible level, the process of treebank substructures rewriting. Moreover, the treebanks developed for a specific language need often to be completely converted to adhere to other standards, for example to comply to the UD specifications and conventions. Using the Semgrex-Plus tool scholars are able to define appropriate sets of rules to convert dependency treebanks into different formats. The tool allows for the definition of formal rules for rewriting dependencies and token tags as well as an algorithm for treebank rewriting able to avoid rule interference during the conversion process. This tool is publicly available (https://github.com/ftamburin/Semgrex-Plus.git).
Tamburini Fabio (2017). Semgrex-Plus (v1.0).
Semgrex-Plus (v1.0)
Tamburini Fabio
2017
Abstract
Creating a treebank, annotating each sentence with its syntactic structure, is certainly a time-consuming and error prone task. For these reasons, treebanks often require maintenance and revisions to correct mistakes or to adapt it to different needs. In big projects, such as the Universal Dependencies (http://universaldependencies.org/) project, guidelines updates due to new language addition, change in theoretical approaches of a specific phenomenon management, mistakes or other changes often require specific tools to automate, at the maximum possible level, the process of treebank substructures rewriting. Moreover, the treebanks developed for a specific language need often to be completely converted to adhere to other standards, for example to comply to the UD specifications and conventions. Using the Semgrex-Plus tool scholars are able to define appropriate sets of rules to convert dependency treebanks into different formats. The tool allows for the definition of formal rules for rewriting dependencies and token tags as well as an algorithm for treebank rewriting able to avoid rule interference during the conversion process. This tool is publicly available (https://github.com/ftamburin/Semgrex-Plus.git).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.