Corpus Linguistics

Bernardini, Silvia; Ferraresi, Adriano

doi:10.4324/9781315158945

A corpus is a collection of authentic, non-elicited texts selected and assembled to study language. Thanks to software applications designed specifically for searching through corpora, known as concordancers or corpus query tools, it is possible to obtain information about patterns occurring in a single text or across sets of texts, that would almost certainly escape us if we only read the texts. At their simplest, corpus methods allow users to find out which words are used most frequently in a given corpus (wordlists), or more frequently in one corpus compared to another that acts as a baseline (keywords); users can also search for words that tend to go together more often than would be expected (collocations) or for repeated word sequences (variously called clusters, n-grams, or lexical bundles). Through the provision of information about word frequencies and about syntagmatic relationships established on the level of discourse (Saussure 1971/ 1916: 170ff.), corpora have revolutionised linguistics, allowing researchers to tap into a major new source of linguistic evidence, thus relaxing ‘the stranglehold of intuition’ (Sinclair 1991: 7) and the exclusive focus on the abstract paradigms of traditional grammar. The first modern corpora were developed in the 1970s and 1980s as a reaction against the methods of so-called ‘armchair linguistics’ (Fillmore 1992), which relied on the linguist’s intuition, or on the intuition of a few informants, to describe aspects of language. At the time, linguistics was heavily influenced by generativist views (e.g. Chomsky 1986), and language as an object of study was largely synonymous with a speaker’s linguistic competence. This in turn referred to knowledge of the grammaticality of a given construction, for which the intuition of a competent speaker was considered an adequate source of evidence. With the growing importance accorded to pragmatics and sociolinguistics, a shift occurred from linguistic competence to communicative competence, or competence on the contextual adequacy of language choices (Hymes 1972). More recently, usage-based linguistic approaches have become mainstream. These postulate that ‘usage events define and continuously redefine the language system in a dynamic way’ (Tummers et al. 2005: 228). In these approaches language performance, or actual samples of authentic language usage, have become the main object of linguistic analysis. Anticipating and accompanying these theoretical developments, in the last 50 years corpus methods have grown in importance and nowadays occupy a central position in linguistics. In the words of Stubbs (2009: 117), ‘[c]orpora are just data and quantitative methods are just methods, but their combination has led to a major shift in theory’. The applied branches of the discipline, such as first- and second- language acquisition, terminology and lexicography, and indeed the study of translation, have in turn discovered corpora, and are currently using them as a fundamental resource for studying the products of these activities, and to obtain indirect evidence about their underlying processes.

Corpus Linguistics / Bernardini Silvia; Ferraresi Adriano. - STAMPA. - (2022), pp. 207-222. [10.4324/9781315158945]