To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
As digital convergence marks the transition from print to screen culture, translation plays an increasingly important role of in the production and dissemination of the news. The translation of information in the news media is a pervasive set of practices that affects the daily consumption of the news and a topic of relevance to scholars in several areas of the humanities and the social sciences. This book provides a wide-ranging and accessible introduction to research in news media translation practices, products and processes, illustrating and discussing historical, theoretical and descriptive perspectives. Inter- and multi-disciplinary research spans fields such as Translation Studies, Linguistics, Journalism and Media Studies, and includes approaches from Critical Discourse Analysis and narrative theory to Systemic Functional Linguistics and Corpus Linguistics. The book also offers first-hand analyses of news texts in English and Italian, approaching news translation from an ethnomethodological perspective.
Mark Van Mol provides a critical review of the issues involved in the construction of usable Arabic corpora and the solutions that programmers have attempted in resolving them. One such issue is whether a corpus is made freely available or is placed behind a paywall. This distinction often translates into corpus size, as well, with freely available corpora generally being larger and untagged for parts of speech (POS) and those hidden behind paywalls being smaller and POS-tagged. The reason for this is clear: POS tagging requires large amounts of painstaking labour; on the other hand, scouring large amounts of text from the Internet with web scrubber applications can be done in seconds. As for corpus size, different qualifications make it difficult to compare. Size may be expressed in the number of articles, hours, tokens, kilobytes, megabytes, sentences, words, and sometimes paragraphs that the corpus encompasses. One of the reasons for this is that defining the searchable units of Arabic texts presents complications. Such considerations pertain directly to questions of corpus representativeness. With that arises the question of the nature of the phenomenon under scrutiny, whether the corpora are intended to represent Classical Arabic, modern written Arabic, or Arabic dialects.
The authors examine the application of electronically searchable corpora, from their own experience, in addressing questions pertinent to linguistics as a whole and to matters internal to Arabic, the while lamenting that the field of Arabic linguistics, in its theoretical and applied orientations alike, has not made use of the rich data source that searchable electronic corpora represent. They show how corpora can be used easily to falsify common assumptions and assertions about the human language capacity in general just as they can be used efficiently to query assumptions and assertions about Arabic itself. So, too, do they hold implications for applied uses such as teaching Arabic as a foreign language and translation between Arabic and other languages. In any of these applications, the use of corpora in the analysis of all varieties of Arabic remains underdeveloped compared to their use in the analysis of other languages, especially English.
This Element provides a basic introduction to sentiment analysis, aimed at helping students and professionals in corpus linguistics to understand what sentiment analysis is, how it is conducted, and where it can be applied. It begins with a definition of sentiment analysis and a discussion of the domains where sentiment analysis is conducted and used the most. Then, it introduces two main methods that are commonly used in sentiment analysis known as supervised machine-learning and unsupervised learning (or lexicon-based) methods, followed by a step-by-step explanation of how to perform sentiment analysis with R. The Element then provides two detailed examples or cases of sentiment and emotion analysis, with one using an unsupervised method and the other using a supervised learning method.
Which normally transitive verbs can omit their objects in English (I ate), and why? This article explores three factors suggested to facilitate object omission: (i) how strongly a verb selects its object (Resnik 1993); (ii) a verb's frequency (Goldberg 2005); (iii) the extent to which the verb is associated with a routine – a recognized, conventional series of actions within a community (Lambrecht & Lemoine 2005; Ruppenhofer & Michaelis 2010; Levin & Rapaport Hovav 2014; Martí 2010, 2015). To operationalize (iii), this article compares the writings of different communities to offer corpus and experimental evidence that verbs omit their objects more readily in the communities in which they are more strongly associated with a routine. More broadly, the article explores how the meaning and syntactic potential of verbs are shaped by the practices of the people who use them.
Corpus pragmatics is an emerging area of research with a growing number of specialist publications. Research in corpus pragmatics draws on empirical language samples captured in language corpora to explore a wide variety of key topics in pragmatics, such as discourse markers, speech acts and (im)politeness. However, the majority of research to date in corpus pragmatics is based on textual (transcribed) renderings of spoken discourse, and there is a notable lack of corpus pragmatic studies that also adopt a multimodal approach, investigating the potential contribution of multiple modes (including speech, gestures and facial expressions) to utterance functions. The current chapter highlights the affordances of using a multimodal corpus pragmatic approach in exploring the role of speech and gesture in meaning making. We illustrate this approach with the example of speech-gesture functional profiles arising from a multimodal analysis of multiword expressions (e.g. ‘do you know/see what I mean’). The chapter provides an overview of key corpus methods that have been used in sociopragmatic research and pragmatic research more generally before presenting our multimodal corpus pragmatic research on ‘do you know/see what I mean’.
In this last chapter I discuss the potential impact of the intersubjective gradience model on research in Cognitive Linguistics and Pragmatics. I explicitly refer to intersubjective gradience as a schematic mechanism. Abstract representation of immediate and extended interaction contribute to the formulation of linguistic acts as much as image schemata (i.a. Lakoff 1990; Mandler 1992; Di Maggio 1997) are hypothesised to trigger metaphorical thinking and determine the morpho-syntactical structure of grammatical constructions. I then show the applicability of the gradience model in Autistic Spectrum Disorder (ASD) research and the way this usage-based framework can inform a fine-grained assessment of individuals’ ability to overtly express their awareness of an addressee’s potential reactions to what is being said. I finally summarise the main assumptions of the gradience model of intersubjectivity of this book.
This chapter is centred on interlocutors’ ability to spontaneously construe intersubjectified utterances throughout ontogeny and first language acquisition. The diachronic continuum illustrated in Chapter 3 is also at stake throughout children’s ontogenetic development. Children acquire the capacity to spontaneously express immediate intersubjectified (I-I) polysemies of a lexeme before they develop the skills to convey extended intersubjectified usages (E-I) of the same form. In Section 4.1 I introduce the application of the gradience model in first language acquisition and theory of mind research with reference to children's usage of the Italian construction guarda ‘look’. Section 4.2 focuses on the Mandarin construction 你看 nǐkàn ‘look, you see’ and the way new intersubjectified polysemies of 你看 nǐkàn (hinging on mirativity and opinion elicitation), significantly emerge at later stages of development than directive usages of the same form. In section 4.3 I discuss the acquisition of the Mandarin post-verbal 过 guo. Interpersonal evidential polysemies of 过 guo are spontaneously mastered by children around the seventh year of age, e.g. comparatively later than other usages of the same form. Section 4.4 is finally dedicated to the first language acquisition of the pre-nominal such and children’s progressive ability to use express generic reference to objects, entities and events to which they ascribe collective recognition. This extended intersubjective function of such emerges later than other polysemies of the same form aimed at merely establishing joint attention.
This chapter ‘puts the gradience model into play’ through a corpus-based application of the framework to semantic-pragmatic change in a number of constructions in American English, British English, Mandarin Chinese and other world languages. Each section is centred on a different construction diachronically acquiring new extended intersubjective (E-I) polysemies that progressively arise out of original literal usages. In a number of cases, an intermediate immediate intersubjective (I-I) stage of reanalysis can be formally identified in the sequence of changes of a construction. In some other instances, E-I polysemies may arise directly from literal usages of the construction. Section 3.1 touches upon the universality of intersubjectification as a ubiquitous process of change in the world languages. In Section 3.2, I then illustrate the continuum from immediate to extended intersubjectification of the 干嘛 ganma construction in Mandarin. Among the extended intersubjectified linguistic acts that I analyse in the chapter there is American English common-sense assertions of [you don't want X] (Section 3.3) and the attention-getting functions of the chunk believe it or not (Section 3.4). Extended intersubjectivity also intersects with evidential statements of shared knowledge through the usage of the Mandarin 过 guo construction (Section 3.5) and in assertions of expected agreement with the Mandarin sentence final particle 吧 ba (Section 3.6). Finally, I discuss the existential construction [ there is no X] in British English, which diachronically developed a new intersubjectified function to pre-emptively address what a speaker imagines a specific or generic interlocutor will say as a result of a current turn-taking.
Paralleling the summing problem associated with identifying a single intention of a multimember lawmaking body, the semantic summing problem appears when there are competing potential meanings for constitutional words or phrases. This chapter addresses the question of whether the new digital tools used in corpus linguistics searches have the potential to offer a “Big Data” solution to the problem. By examining the nature of the digital collections being searched, as well as the data analysis tools being employed, this chapter shows that corpus linguistics will not solve the semantic summing problem, and may well exacerbate it.
This chapter reviews the quantitative corpus linguistic literature on development of cohesion in first- and second-language writing. It first provides a theoretical and methodological context for such work by discussing the two main frameworks within which cohesion has been researched. It then critically reviews an extensive body of literature to establish what substantive conclusions can be drawn and what might constitute productive foci for future research. Interest in cohesion as a correlate of development has been less intense than that seen for grammar, vocabulary, and formulaic language, and few consistent patterns have emerged. While there is some indication that a small number of measures are associated with development, evidence on these is too sparse for any confident conclusions to be drawn. Moreover, quantitative measures of cohesion appear to be highly contextually specific, depending crucially on the nature of the text and on the writer's estimation of their audience's topic knowledge.
This chapter reviews the quantitative corpus linguistic literature on formulaic language development in writing. It first provides a theoretical and methodological context by discussing the construct of formulaic language and the various ways in which it has been operationalised in studies of writing development. It then critically reviews the literature to establish what substantive conclusions can be drawn and what might constitute productive foci for future research. The review highlights a lack of interest in first-language studies. However, second-language studies have seen a rapid expansion of interest over the last decades, which has yielded a number of consistent patterns. In particular, writing quality is positively associated with the percentage of n-grams attested in a reference corpus, the mean strength of association between collocates (again as attested in a reference corpus), and the prevalence of sequences which analysts subjectively identify as formulaic. It is also negatively associated with use of formulas copied from source materials. Key areas in which further methodological development is needed include: understanding how analysts identify sequences as formulaic; increasing the size and rigour of studies looking at discourse functions of lexical bundles; understanding the impact of reference corpus on findings; developing corpora representative of learner input.
This chapter establishes a theoretical and methodological foundation for the quantitative corpus linguistic study of writing development. First, it defines and discusses the central constructs of writing, writing proficiency, development, and quantitative corpus linguistics. Second, it sets out four assumptions on which, we argue, quantitative corpus approaches rest and discusses in detail both the strengths of these approaches and the methodological challenges they need to confront. Third, it gives a detailed discussion of specific methodological issues related to defining and measuring key variables of development and context and of establishing the status of particular measures of language use. Finally, the chapter reviews one particular quantitative corpus linguistic approach (multidimensional analysis) which raises important questions about quantitative corpus linguistic methodology as a whole.
This chapter brings together discussions and evidence from the preceding chapters to draw conclusions about first- and second-language writing development and about quantitative corpus linguistics as a methodology. It first summarises the key patterns of development in terms of grammar, vocabulary, formulaic language, and cohesion. It then discusses implications of these findings for the key constructs of time- and quality-related development and draws methodological conclusions with regard to how quantitative measures of development have been, and in the future could be, theorised and operationalised and the types of text samples on which studies have been, and could be, built. The chapter ends by setting out a number of key priorities for future research, grouped under the headings of theorisation, broadening attention to contexts, and integration with other methods.
This chapter reviews the quantitative corpus linguistic literature on syntactic development in first- and second-language writing. It first provides a theoretical and methodological context for such work by discussing the construct of syntactic proficiency. It then critically reviews an extensive body of literature to establish what substantive conclusions can be drawn and how future research could most productively develop. The strongest developmental patterns are found for generic measures of syntactic complexity, as operationalised through measures such as mean length of sentence/T-unit/clause and subordinate clause ratios. However, we argue that such measures are relatively uninformative with regard to a detailed understanding of development. Our review of more specific syntactic measures highlights a number of key features which have the potential to give useful insights into language development, while also underscoring the fragmentary nature of the measures studied to date. Methodologically, the review identifies a pervasive lack of conceptual clarity regarding what is measured and why. We find important unacknowledged differences in how key terms (e.g. clause, noun phrase) are defined and operationalised, which make it difficult to build a theoretically meaningful and cohesive developmental picture.
This chapter reviews the quantitative corpus linguistic literature on vocabulary development in first- and second-language writing. It first provides a theoretical and methodological context for such work by discussing the construct of vocabulary proficiency. It then critically reviews an extensive body of literature to establish what substantive conclusions can be drawn and what might constitute productive foci for future research. The strongest developmental patterns are found for measures of vocabulary diversity and use of academic vocabulary. There are also important, but complex, developmental patterns with regard to use of high- versus low-frequency words. However, the current range of measures attested in the literature has significant limitations: they are limited in scope, focusing almost exclusively on breadth, rather than depth, of vocabulary knowledge; relationships between measures and knowledge constructs is often unclear; the relationships between measures themselves, which often overlap with each other in complex ways, are largely unexamined; measures are often too coarse-grained, and may consequently disguise important developmental patterns by conflating distinct constructs.
This chapter sets out the aims of the book and delimits its scope by defining the field of quantitative corpus linguistics (QCL), describing its key strengths as a way of understanding written-language development, and setting out some of the methodological problems which researchers need to untangle. It then outlines the systematic literature reviews on which the book is based and provides a broad overview of the trajectories this literature has taken over time.
Chapter 8 returns to a focus on methods in sociophonetics, considering the ways that sociophonetics has been and can be integrated with corpus linguistics, computational linguistics and natural language processing. The chapter considers the turn towards “big data” across the social and hard sciences and the ways that technological improvements and software developments (both for data analysis and data acquisition) have fueled the development of sociophonetics and paved the way for rapid methodological advancements and substantive breakthroughs. In its treatment, the chapter surveys the present state-of-the-art (e.g. forced alignment, automatic formant extraction) and upcoming developments, and weighs the pros and cons of these new approaches. In doing this, it provides a thorough treatment of some less often discussed methodological concerns underlying the present and future of the field.
Quantitative corpus research on written language development has expanded rapidly in recent years, assisted by the ever-increasing power and accessibility of software capable of reliably analysing huge collections of learner writing. For this work to reach its full potential, it is important that researchers have a strong understanding of its methodological foundations and of the existing empirical evidence base on which it can build. This book provides the most comprehensive discussion to date of research in this area. Covering both first and second language learning contexts, it sets out a coherent theoretical framework and systematically reviews studies published over the last seventy years in order to establish what such research has taught us about written language development, what it hasn't taught us, and what we should do next. Timely and original, this is an essential reference work for academic researchers and students of first and second language writing.
Research on gender differences in language use previously focused mainly on affluent, especially Western societies. The present chapter extends this research to acrolectal Indian English, a postcolonial variety of English, investigating how the use of intensifiers (e.g. very, really) is affected not only by the speakers’ gender, but also their age, the gender of the other speakers in the conversation and the formality of the context. Results show some parallels with Western varieties of English, in particular a tendency for women to use more intensifiers than men in informal contexts. However, Indian women modify their usage of intensifiers with respect to the formality of the context more than British women and men, while Indian men do so less than British women and men. In mixed-sex conversations, Indian women also converge with Indian men in their intensifier usage, while neither British women nor men do so. The more flexible use of intensifiers by Indian women may be a response to societal expectations regarding their linguistic behaviour, in order to avoid censure by society. British women likewise continue to be affected by such constraints, but much less so, while the linguistic behaviour of Indian and British men is subject to less criticism.