CorDis is a large, XML, TEI-conformant, POS-tagged, multimodal, multigenre corpus representing a significant portion of the political and media discourse on the 2003 Iraqi conflict. It was generated from different sub-corpora which had been assembled by various research groups, ranging from official transcripts of Parliamentary sessions, both in the US and the UK, to the transcripts of the Hutton Inquiry, from American and British newspaper coverage of the conflict to White House press briefings and to transcriptions of American and British TV news programmes. The heterogeneity of the data, the specificity of the genres and the diverse discourse analytical purposes of different groups had led to a wide range of coding strategies being employed to make textual and meta-textual information retrievable. The main purpose of this paper is to show the process of harmonisation and integration whereby a loose collection of texts has become a stable architecture.
Marchi Anna, Cirillo Letizia, Venuti Marco (2007). The CorDis Corpus: Mark-up and related issues..
The CorDis Corpus: Mark-up and related issues.
Marchi Anna
Co-primo
;
2007
Abstract
CorDis is a large, XML, TEI-conformant, POS-tagged, multimodal, multigenre corpus representing a significant portion of the political and media discourse on the 2003 Iraqi conflict. It was generated from different sub-corpora which had been assembled by various research groups, ranging from official transcripts of Parliamentary sessions, both in the US and the UK, to the transcripts of the Hutton Inquiry, from American and British newspaper coverage of the conflict to White House press briefings and to transcriptions of American and British TV news programmes. The heterogeneity of the data, the specificity of the genres and the diverse discourse analytical purposes of different groups had led to a wide range of coding strategies being employed to make textual and meta-textual information retrievable. The main purpose of this paper is to show the process of harmonisation and integration whereby a loose collection of texts has become a stable architecture.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.