Skip to main content Accessibility help
Hostname: page-component-768dbb666b-x9ds4 Total loading time: 2.352 Render date: 2023-02-03T11:56:50.062Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

2 - Method

Published online by Cambridge University Press:  16 December 2021

Charles Boberg
McGill University, Montréal


This chapter presents basic information for a wide readership on how accents differ and how those differences are analyzed, then lays out the sample of performances to be studied, the phonemes and word classes to be analyzed, and the methods of phonetic, quantitative, and statistical analysis to be followed.

Accent in North American Film and Television
A Sociophonetic Analysis
, pp. 58 - 108
Publisher: Cambridge University Press
Print publication year: 2021

This chapter presents and explains the analytical methods used in this book: the construction of a sample of film and television speech; the acoustic phonetic analysis of the performances in that sample; and the quantitative and statistical analysis of the resulting phonetic data. To address a wider readership beyond sociophoneticians and dialectologists, however, the chapter will begin with a discussion of some analytical preliminaries, introducing the basic theoretical concepts necessary for understanding and interpreting the analysis of the following chapters. Some expert readers will likely be able to pass quickly over most or even all of the material in this chapter and proceed to the analysis in Chapter 3, whereas those with less subject-specific expertise, such as readers whose background is more in film and television or media studies than in linguistics, will likely benefit from reading it closely and perhaps even returning to consult it again if technical questions arise while reading later chapters.

2.1 Analytical Preliminaries: How Accents and Dialects Vary and Change

Most people have some sense of how the language spoken in their community varies and changes, but few without a formal training in linguistics know how to categorize or describe accurately the types of variation and change they observe, which is the first step in understanding those observations in a deeper, more scientific way. This section therefore offers an introduction to the different ways accents can vary and change, including the theoretical concepts and terminology that will be necessary for understanding the analysis presented in the following sections and chapters.

As stated in its subtitle and in Chapter 1, this book presents a “sociophonetic analysis” of film and television speech. Sociophonetics is a branch of linguistics that examines linguistic variation and change at the phonetic level, involving differences or ongoing changes in the sounds of speech that are correlated with the social categories that speakers belong to, like age, sex, and social class, and with their regional identities as members of communities separated by geographic space. Variation and change are treated here as closely related phenomena. This reflects the fact, already noted in Subsection 1.3.2 of Chapter 1, that while not all variation involves change (some remains stable over many generations), all change must begin as variation because people vary in their adoption of new ways of speaking: as with all changes in social behavior, there are leaders and followers; promoters and resisters. Many cases of variation therefore reflect a competition between newer and older linguistic variants. This competition can result in the eventual triumph of the new form, the reassertion of the old form once the new form loses its appeal, or the development of stable variation between them. Of the following chapters, Chapter 3 will focus on patterns of change, while Chapters 57 will focus mostly on patterns of variation; Chapter 5 will examine New York City English from both perspectives.

Variation and change can occur at all levels of linguistic structure: speakers can differ in or change the words they use (their vocabulary or lexicon); the meaning of those words (lexical semantics); how they combine words into phrases and sentences (syntax); the meaning of sentences (semantics); the internal structure of words and how they fit into grammatical patterns, like how nouns are inflected and verbs conjugated (morphology); and the sound patterns of language, including how sounds contrast with each other, how they are pronounced, and how they are distributed across the vocabulary of a language, as well as patterns of word and phrasal stress and intonation (phonology and phonetics). All these types of variation help to form a speaker’s social and regional linguistic identity. This book, as discussed in Subsection 1.6.2 of Chapter 1, will focus almost entirely on the last type, variation and change in phonology and phonetics, or the sounds of language, because that is the level of language largely controlled by actors, whereas the words and sentences they produce on-screen are largely determined by a script or screenplay.

This distinction is closely aligned with the technical definitions of the terms dialect and accent, which are sometimes used interchangeably in a nontechnical sense. Technically, dialects differ at all levels of structure, like Standard British versus American English, involving distinct words and grammatical patterns as well as sound differences, whereas accents differ only at the level of sound. Two people could therefore speak the same dialect (say, Standard American English), so that a transcript of their speech would reveal no differences, but pronounce their words with different accents (say, Southern and Northern, or Canadian), a level of variation that is not reflected in writing. As its title asserts, this book deals with accent, not dialect, in North American film and television: it examines variation and change in the accents or sound patterns heard in the on-screen speech of film and television actors.

Accent (and dialect) differences can be both social and regional. Many are purely regional, without any sense that one pronunciation is more “correct” or “standard” than another. Others start out as regional differences but develop a social connotation, if a country’s standard accent (the one taught in schools and used in broadcast media, etc.) is based in a particular region, like that of the capital city, or if certain social groups are more likely to display a particular accent than others. For instance, if a traditional regional accent is receding over time, converging with pronunciation patterns outside the region, a stronger form of it would be expected from older people, or people in lower educational and occupational groups, who are more local in their social orientation, than from young people with more formal education and more exposure to nonlocal speech.

Dialect differences usually subsume accent differences: though people can speak the same dialect with different accents, people who speak different dialects, with different grammars and vocabularies, usually also differ in accent or pronunciation. While this book is concerned mainly with accent rather than dialect differences in the strict sense, the term dialect will nonetheless sometimes be used in this and other chapters in the more general sense of a regional variety of a language. For example, dialectologists usually speak of dialect regions rather than accent regions, even when they are referring to sound differences.

2.1.1 Variation and Change in Phonemic Incidence

Pronunciation, or the sounds that make up words, can differ in three ways, all of them potentially linked to social and regional identity and subject to change. First, the same word can contain different sounds for different people: as Fred Astaire and Ginger Rogers sing to each other in Shall We Dance (1937), “you like toMAYto / and I like toMAHto.” That song, “Let’s Call the Whole Thing Off,” written by George and Ira Gershwin, also features several other instances of the same kind of variation, such as pajamas pronounced as paJAMas versus paJAHmas, or either and neither pronounced as EYE-ther and NIGH-ther versus EE-ther and KNEE-ther; the two pronunciations of either also provide comic fodder for Groucho Marx in Duck Soup (1933). These are all differences in phonemic incidence: variation in which English sounds, or phonemes, occur in particular words. Some differences in phonemic incidence are linked to region: Canadians, following a British model, tend to pronounce shone, the past tense of shine, so that it rhymes with gone, whereas most Americans pronounce it to rhyme with bone (Reference AvisAvis 1956: 50; Reference ChambersChambers 1994: 45–46); Americans in the northern United States pronounce the preposition on so that it rhymes with don, whereas those in the Midland and southern United States rhyme it with dawn (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 189). Phonemic incidence is specified in dictionaries: the entries for some words include variant pronunciations, of which some are aligned with national dialect or region, like those of shone and on, while others apparently coexist in free variation, without any evident regional or social connotations.

Some variables of phonemic incidence involve dozens or even thousands of words that vary in parallel ways. My own previous research has examined the large set of originally foreign words that vary between the short-a or /æ/ or trap sound and the long /ah/ or palm sound in North American English, like the pajamas example in the Gershwin song (the system of transcribing vowel sounds used in this book is presented in Section 2.4; until that point, conventional orthography will be used wherever possible). Words like angst, Cezanne, chianti, Havana (another example from the song), Iraq, soprano, teriyaki, Vietnam, and Yamaha, as well as the state names Colorado and Nevada, to cite only a few of hundreds of possible examples, show the same alternation between /æ/ and /ah/, forming what I call the “foreign-a” word set (Reference BobergBoberg 2020b). Most phonemic incidence variables, however, involve only a few parallel examples and many are completely idiosyncratic, restricted to a single word, which is why they need to be specified in dictionaries. For example, some people say route so it rhymes with shoot, whereas others rhyme it with shout, but it is difficult to think of other words that share the same pattern of alternation.

Moreover, like the different words for carbonated beverages discussed in Subsection 1.3.3 of Chapter 1 (coke, pop, soda, etc.), while many variables of phonemic incidence are linked to social or regional identity, most of them do not occur frequently enough in conversation to be of much help to people in assessing or reacting to the social or regional identity of the people they talk to or listen to on TV and in films, or in projecting their own social or regional identities. For instance, knowing that the word roof pronounced with the vowel of rook is more likely to be heard from the Midwest to the Pacific Northwest and in central California, whereas roof rhyming with proof is more likely to occur elsewhere (Reference AvisAvis 1956: 50; Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 292), is of no consequence if the word roof does not happen to arise in conversation. Yet people are usually able to get a good sense of someone’s regional background, including that of a film or TV actor, after hearing any short sample of speech, regardless of which particular words it contains, suggesting that this ability depends on other, more systematic kinds of variation that involve the sound system as a whole, rather than individual words. Therefore, following the sociophonetic research tradition discussed in Subsection 1.3.3 of Chapter 1, the analysis in this book will focus not on variables of phonemic incidence but on the two other types of pronunciation variable, which are more systematic, affecting large sets of words defined by the sounds they contain: variation and change in phonemic inventory and variation and change in the phonetic quality of phonemes, which together account for what is normally referred to as someone’s accent.

2.1.2 Variation and Change in Phonemic Inventory: Phonemes and Allophones; Phonemic Mergers and Splits

A phonemic inventory is the set of contrastive sounds in a language or dialect: those that differ in ways that can make a difference between two words. Such contrastive sounds are called phonemes and are written between forward slashes. In English, for instance, the difference between /p/ and /t/ is phonemic because it can distinguish minimal pairs like pin and tin, or cap and cat, which differ in only one sound. While all varieties of English maintain a phonemic contrast between /p/ and /t/, other phonemic contrasts, particularly those between vowel sounds, vary among regions and social groups. If pin and tin sound different for all English-speakers, pin and pen, or tin and ten, or him and hem, do not: in the American South, these words are often homophones, sounding identical (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 68). This is a conditioned phonemic merger or neutralization of contrast: like all English-speakers, Southerners distinguish the vowels /i/ and /e/ in other contexts, like pit and pet, or bid and bed, but this contrast is neutralized before the nasal consonants, /n/ and /m/; it is conditioned (caused) by the presence of a following nasal consonant and does not occur where that condition is absent. Conditioned mergers frequently neutralize vowel contrasts before /r/ in North American English: all English-speakers distinguish the short-a and short-e vowels of minimal pairs like mat and met, or bad and bed, but most North Americans west of the Atlantic coast have lost this distinction before intervocalic /r/, making homophones of marry and merry, or barrel and beryl, or Harold and herald. Conditioned mergers like these have a big effect on perception of regional accents: when people in Montreal, New York City, or Philadelphia say words like carrot, charity, and Paris, they use the same vowel sound as they have in cat or chat or pat, whereas most other Canadians, Midwesterners, and Western Americans use the same vowel as they have in care, chair, or pear. This difference can be heard across dozens of words that feature short-a before /r/.

Even more consequential for accent differences is an unconditioned merger, in which a phonemic distinction is lost in all contexts. Whereas conditioned mergers reduce the set of contexts in which two phonemes contrast with each other, unconditioned mergers eliminate the contrast altogether, causing a change in the phonemic inventory of the affected dialect. The most important example of an unconditioned merger in modern North American English is that between the short-o of words like cot, stock, don, and collar and the long-open-o of their minimal pairs, caught, stalk, dawn, and caller. As discussed in Subsection 1.3.3 of Chapter 1, these pairs are distinct in the Mid-Atlantic region from New York City to Baltimore, in the Inland North from Buffalo to Chicago and in much of the South (as well as in most British dialects), but they are homophonous, indicating a merger, in most of the rest of the continent, including Canada and the western United States (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 60–61). Because both vowels are pronounced in the lower-back part of the mouth, this is known as the Low-Back Merger. When such a merger occurs, it not only alters the sound of one or both vowels involved in it but also creates extra room in the mouth for the remaining vowels, which can shift into the space freed up by two vowels becoming one. For instance, if stock sounds the same as stalk, then stack can be pronounced a little more like stock without getting confused with it. This can set off chains of responsive shifts in the pronunciation of several vowels, like those discussed in Subsection 1.3.3, leading to far-ranging effects on the accents of the affected regions.

Phonemic inventories can also be affected by the opposite of a merger: a phonemic split, in which one phoneme becomes two. This usually begins as a conditioned change, in which a phoneme develops a variant, or allophone, that has a distinct sound in a specific environment, brought about by the influence of adjacent phonemes. In English, for instance, the phoneme /k/ has a different quality before the “ee” sound in key, with the blade of the tongue pushed up against the ridge behind the upper-front teeth, than it has before the “oo” sound in cool, with the body of the tongue pulled toward the back of the mouth. In some languages the difference between these two “k”-like sounds is phonemic: they both occur in the same environments and, as separate phonemes, can distinguish minimal pairs, like /k/ and /t/ in English. They do not contrast in English, however: they are allophones of a single phoneme, /k/, in complementary distribution with each other, meaning that the alternation between them is governed by a conditioning environment, in this case the nature of the following vowel, so that where one sound occurs, the other does not. While the conditioning environment remains intact, the sounds remain in complementary distribution as allophones of the same phoneme, but if the conditioning environment is altered or removed, so that the distribution of the allophones is no longer predictable, speakers begin to analyze the two sounds as separate phonemes: the original phoneme is split into two. Like a merger, this has consequences for other phonemes, which must adjust to the extra space required to keep the two phonemes adequately distinct.

Phonemic splits have happened several times in the history of English. The split with the most important consequences for North American dialects affected the short-a phoneme of Middle English, the vowel of words like bath and trap (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 173–184). This began as a lengthening of the vowel before voiceless fricatives, the /f/, /th/, and /s/ sounds, in words like staff, bath, class, and past, whereas the vowel remained short in other environments, in words like stack, bat, trap, and patch, establishing an allophonic alternation. In southern England, the lengthened allophone shifted toward the low-central part of the mouth to produce the modern /ah/-sound of Standard British English (“bahth, clahss,” etc.). In the American mid-Atlantic region, by contrast, it shifted forward and up, developing a quality similar to the long-a of bay or clay, but followed by a glide of the tongue toward an “uh” sound in the middle of the mouth, so bath and class came to sound like “bay-uth” and “clay-uss” in New York City and Philadelphia. In subsequent changes, however, the allophonic relationship between the two sounds became obscure. Lengthening and raising spread to new environments, such as before voiced stops in words like cab, bad, and tag and before nasal consonants in words like can (the noun) and band, but was conditioned by increasingly complex phonetic and nonphonetic rules, such as a constraint that blocked it from applying to function words like the article an and the modal verb can (as in I can). This produced a contrast between the short and long vowels in minimal pairs like Anne or tin can, with long, raised vowels, against an or I can, with short, low vowels. What had started as an allophonic alternation therefore became phonemic and the original short-a phoneme of Middle English split into two, producing a Split Short-a System, which will be discussed in more detail, with references to the relevant previous research, in Chapters 3 and 5.

Splits can, of course, be undone, if the two phonemes subsequently remerge. During English-speaking settlement of the Inland North, the long-raised and short-low vowels merged to become a single long, raised phoneme. Words like stack, bat, trap, and patch, which have short, low vowels in the mid-Atlantic region, were raised and developed off-glides in Chicago and other northern cities: “stay-uhk, bay-uht, tray-uhp, pay-uhtch,” etc. In the Midland, the West, and Canada, by contrast, raising only happens when short-a occurs before nasal consonants, as in stamp, band, pan, etc., realized as “stay-uhmp, bay-uhnd,” etc. Whether this represents the remerger of an originally split short-a or an original situation or a combination of both is not always clear and varies in any case by region, but from the contemporary point of view the modern Western and Canadian system also entails a single phoneme because raising is a predictable alternation conditioned by the presence of a following nasal consonant. Most speakers of modern North American English now have only one short-a phoneme, with allophonic raising before nasals, as discussed in greater detail in Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 173–184) and here in Chapters 3, 5, 6, and 7.

2.1.3 Variation and Change in Phonetic Quality: The Vowel Space, Vowel Systems, and Vowel Shifts

The third way that dialects or accents can differ or change at the level of sounds is in the phonetic quality of their phonemes and allophones. While all phonemes can vary at the phonetic level, the most important accent differences in English, apart from vocalization of /r/, involve variation in the phonetic quality of vowel phonemes: vowels can be longer or shorter, or more or less diphthongal, and produced with different tongue positions, dramatically affecting their sound. At this point the level of analysis passes from phonology – the lexical distribution and contrastive relations of phonemes – to phonetics, the study of speech sounds. Phonetics is a complex and highly technical subject that cannot be fully addressed in this book; readers interested in a general discussion are encouraged to consult standard textbooks on the subject, such as Reference Lieberman and BlumsteinLieberman and Blumstein (1988), Reference Reetz and JongmanReetz and Jongman (2009), or Reference Ladefoged and JohnsonLadefoged and Johnson (2015), or the phonetics section of a general linguistics textbook, or online resources, etc. The discussion in this section will be restricted to two phonetic concepts essential to understanding the analysis presented in the following chapters: (1) the organization of vowel phonemes and their allophones in a continuous articulatory space defined by two dimensions, height (high-mid-low) and advancement (front-central-back); and (2) the possibility of vowels shifting across that space, either singly or in sets of related shifts. The relationship of articulatory space to the acoustic phonetic parameters analyzed in the rest of the book will be discussed in the following section.

Phoneticians study several dimensions of vowel quality, but the most important for distinguishing English vowels from one another – and to a large extent for distinguishing English regional accents from one another – are height and advancement. These refer to the position of the tongue inside the mouth while a vowel is being produced. Vowel production essentially involves pushing air from the lungs through the larynx, where the vocal folds vibrate, creating waves of air pressure (sound waves) that pass into the mouth. The mouth acts as a resonator, like the body of a musical instrument, amplifying and modifying the sound waves before they pass out through the lips, to travel through the air to the hearer’s ear. Different vowel sounds, like “ee” and “ah” and “oo,” are made by changing the position of the tongue and lips, thereby modifying the shape of the resonating cavity and the physical characteristics of the sound wave it emits. High vowels, like those of see or pool, are made with the tongue near the top of the mouth, leaving a narrow space for air to flow out through narrowly separated lips. Low vowels, like those of cat or law, are made with the tongue at the bottom of the mouth and the jaw and lips in a more open posture. Front vowels, like those of see or cat, are made with the front of the tongue near the front teeth and lips, whereas back vowels, like those of pool and law, are made with the body of the tongue pulled back toward the throat.

Height and advancement are both continuous dimensions: languages divide them up into sectors occupied by each of their vowel phonemes, with perceptual boundaries between those sectors, but in a physical sense the tongue can occupy any position along each dimension; its range of possible positions is limited only by the outer perimeter of the vowel space, which is imposed by the shape of the mouth. This vowel space is, roughly speaking, a trapezoid, with its four corners formed by the four combinations of the two dimensions: high-front (as in see), low-front (cat), high-back (pool), and low-back (law). Other vowels occupy intermediate positions along the dimensions. Between high-front see and low-front cat is the mid-front vowel of day; between low-front cat and low-back law are the low-central vowels of tie and cow; and between low-back law and high-back pool is the mid-back vowel of cold. Readers can get a sense of these dimensions as continua by saying the word why very slowly, passing gradually from the high-back “oo” sound at the beginning, through the low-central “ah” sound in the middle, to the high-front “ee” sound at the end: each sound flows into the next without a distinct boundary between them. The articulatory space formed by the height and advancement dimensions, with the approximate locations of the vowels just mentioned, is shown in Figure 2.1, which presents a vowel chart. Such charts are conventionally oriented with the front vowels on the left and the high vowels at the top, as though one were looking at a left-facing sagittal section through the mouth, with the lips on the left and the throat on the right.

Figure 2.1 The vowel space: vowel chart showing articulatory space formed by dimensions of height and advancement

The vowel chart in Figure 2.1 is highly simplified. In fact, English has a relatively complex vowel system, compared to languages like Spanish, with around sixteen vowel phonemes (Spanish has five); the exact number varies by dialect, depending on the outcome of phonemic mergers and splits like those discussed previously. The full set of English vowel phonemes and their most important allophones will be presented in Section 2.4. The purpose of Figure 2.1 is to emphasize the status of vowel phonemes as points in a two-dimensional space. The variation and change in accents that will be analyzed in this book involves changes in the locations of these points: the vowel of day or cat, for example, can shift to the right, a more “retracted” position (away from the front of the mouth), or shift up, a more “raised” position (toward the top of the mouth); the vowel of law can shift up, into a higher position, or to the left, into a more central position; and so on. The most salient differences between the accents of places like London, New York, Toronto, Chicago, Dallas, Los Angeles, and Sydney, Australia, are produced by exactly these kinds of shifts. In particular, the following chapters will refer to “raising” (shifting higher), “lowering” (shifting lower), “retraction” or “backing” (shifting farther back), and “fronting” or “centralization” (shifting away from the back of the mouth toward a front or central position). When people from different dialect areas say a word like house, for example, their different pronunciations can be characterized in these terms: for a Southerner, the first part of the vowel of house is fronted toward the quality of cat; for an Inland Northerner, it is backed toward the quality of law; for a Canadian, it is raised toward the mid-central quality of the vowel in hut. Readers can perceive and understand this level of variation by dividing the vowel of house into two parts, the first, ah-part and the second, oo-part (“ah-oo”), then substituting the vowel sounds of cat, law, and hut for the first part while keeping the second part the same and listening to the result.

Figure 2.1 also serves to emphasize that vowel phonemes are interrelated components of a vowel system. Each vowel has a position in this system, called its field of dispersion, occupied by individual exemplars or tokens of the vowel and its allophones (words that contain it: see Figures 5.2 and 5.5 in Section 5.3 of Chapter 5 for an example). Concrete instances of the vowel vary in their exact position, showing the influence of assimilation to adjacent sounds (allophonic effects), the degree of stress or emphasis placed on the syllable, the style of speech, and other factors. The phoneme is therefore an abstract category that comprises these concrete instances, which could be characterized as a two-dimensional range in articulatory space or as a mean and standard deviation of their positions in each dimension. Between the fields of dispersion of neighboring vowels are margins of security: buffer zones that prevent tokens of one phoneme from crossing a perceptual threshold and being misunderstood as tokens of one of its neighbors; this is an important constraint if language is to function efficiently as a signaling system. For instance, tokens of the word day can only shift so far down and inward before they stray into the territory of tie, so that day sounds like die; cat can only move so far back before it sounds like cot. Yet shifts like this do happen: in parts of southern England, the southern United States, and Australia, day does sound to people from other regions quite a lot like die; in the American Inland North, cot sounds to many people from outside the region like cat. For speakers of the affected dialects, confusion is avoided by familiarity with the local accent and by other changes to surrounding vowels, moving them out of the way. To Londoners and Australians, day does not sound like die because die has retracted, to sound a bit like “doy”; in the American South, die has become “dah”; in the Inland North, cat has become “kay-at.” These changes are parts of the vocalic chain shifts (specifically the Southern Shift and Northern Cities Shift) that are discussed in Subsections 1.3.3 and 1.3.4 of Chapter 1 and in Chapter 6, as well as in Reference Labov and EckertLabov (1991, Reference Labov1994) and Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006).

Chain shifts happen either when one vowel gets too close to one of its neighbors, forcing the second vowel to move out of the way to maintain its margin of security; or when one vowel shifts away from one of its neighbors, so that the second vowel moves into the territory vacated by the first vowel to increase its margin of security relative to its other neighbors. In some cases, however, compensatory shifts do not occur and the fields of dispersion of two neighboring vowels coalesce in a phonemic merger, presumably because the “cost” of the resulting homophony to the vowel system (words that used to have different vowels but are now indistinguishable) is less than the cost of the compensatory shifts that would be necessary to avoid it. As previously noted, phonemic mergers and splits both have structural consequences for the vowel system as a whole. Vowels are like people in an elevator, who tend to distribute themselves symmetrically around the periphery of the available space: if an elevator passenger moves, leaves, or enters, the other passengers adjust their position in accordance with the new situation, to maintain a maximal space between themselves and other passengers. Similarly, a merger (like an elevator passenger leaving) creates new articulatory space and draws neighboring vowels into it (a pull shift), whereas a split (like another passenger entering the elevator) takes up additional space and forces surrounding vowels to adjust accordingly (a push shift). In this way, variation and change at the level of phonemic inventory, the second type of sound difference discussed in the previous subsection, is systemically related to variation and change in the phonetic quality of phonemes, the third type discussed in this subsection: the underlying inventory and structure of the vowel system produce the superficial phonetic features that are heard as regional accents. The latter cannot be properly understood without the former.

This relationship can also work the other way, with changes in pronunciation at the phonetic level having structural, phonological consequences. This is one of the mechanisms underlying phonemic merger: through shifts in pronunciation, two vowels approximate each other in phonetic space until they can no longer be reliably distinguished as separate phonemes at the phonological level. A less obvious example is the vocalization of /r/, the only consonantal variable to be studied in this book. As mentioned in Chapter 1, this is sometimes called “r-dropping,” but is known technically and more accurately as vocalization, which means turning a consonant into a vowel. The /r/ of modern English is actually very similar to a vowel: its distinctive sound is made by a constriction of the tongue, either by bunching it up or curling the tip backward. If that constriction is removed, which is a change at the phonetic level, the /r/ becomes a vowel: either a continuation of the vowel before it, making a long monophthong (a vowel with a steady pronunciation), as when start is pronounced “staht”; or a murmur-like vowel called a schwa, like the hesitation word uh, making a diphthong (a two-part vowel), as when force is pronounced “foe-uhce,” or hair becomes “hey-uh.” In some cases, this leads to change at the phonological level, when minimal pairs that are distinguished by the presence of postvocalic /r/ merge: farther sounds like father and source like sauce. In accents where vocalization does not occur, the constriction of /r/ can also have phonological consequences, causing conditioned mergers of vowels before intervocalic /r/, as previously mentioned, because the constriction limits the number of vowel contrasts that can be occur before it. Long vowels like those of face and goat can occur before a word-final /r/ in words like care and bore, but short vowels like those of cat and odd cannot. In dialects with vocalized /r/, the vowel of cat can be retained more or less unaltered in carry and that of odd is retained in orange, whereas in dialects with constricted /r/, the cat vowel merges with the face vowel before /r/, producing “care-y” for carry, and the odd vowel merges with the goat vowel, producing “ore-ange” for orange. Dialects that vocalize /r/ are also called r-less or nonrhotic, while those that do not are called r-ful or rhotic. Vocalization is now the norm in Standard Southern British English, but the opposite change is underway in North American dialects, with many speakers in formerly r-less regions of the eastern and southern United States reinserting /r/ in many words, at least in formal styles of speech (as will be seen in Chapters 3 and 5).

2.2 Analytical Preliminaries: Basic Concepts of Acoustic Phonetic Analysis

The foregoing discussion has examined variation and change in vowel production from the point of view of articulatory phonetics: the tongue positions used to produce vowel sounds. These sounds can be studied through auditory-impressionistic analysis, in which the phonetician listens carefully to the quality of a sound and transcribes it using the set of symbols in the International Phonetic Alphabet (IPA), each of which has a specific phonetic value. Much can be achieved in this way, particularly in the analysis of consonant sounds or variables that involve the simple presence or absence of a sound; this approach will therefore be taken to the analysis of /r/ vocalization in this book. Phonetic variation and change in vowel production, however, involves scalar rather than binary variables, with the position of a token along a range of possible values (the continua of height and advancement discussed previously) requiring precise quantitative measurement. To meet this need, most sociophonetic studies of vowel quality today use acoustic rather than impressionistic phonetic analysis. Acoustic analysis examines not tongue gestures and positions themselves but the reflection of those gestures and positions in the acoustic properties of the sound waves they produce: the sound waves that the hearer’s ears and brain receive, decode, and turn back into language. In general, acoustic analysis produces more objective, precise, and quantifiable data on vowel quality than auditory-impressionistic analysis, as long as it is undertaken with sufficient care. Since the analysis in the remainder of the book will be based mainly on acoustic phonetic data, this section will present a brief explanation of how these data are obtained and what they represent.

In the previous section, vowel production was said to involve the generation of a pressure wave in air, which is transmitted to the listener. The sound wave associated with a vowel sound is a complex periodic waveform that entails a fundamental frequency of vibration, its pitch, as well as a spectrum of higher frequencies above the fundamental; these are selectively amplified by changes in the shape of the mouth, which determine the timbre or quality of the sound. Frequency (the number of oscillations or cycles of vibration per second) is measured in Hertz (Hz), while amplitude (the width of the vibration, producing its relative acoustic energy or loudness) is measured in decibels (dB). The component frequencies of the spectrum of a complex wave can be identified through Fourier analysis, a mathematical method developed in the nineteenth century. A device for visually displaying the results of Fourier analysis, the sound spectrograph, was invented by R. K. Potter and his associates at Bell Laboratories in the early 1940s. This device produced a spectrogram: a two-dimensional display of frequency (vertical) over time (horizontal), with the third dimension, amplitude, indicated by the darkness of the image. Concentrations of acoustic energy, or higher amplitude, appear on the spectrogram as dark horizontal bands at vertical positions indicating their frequency; these are called formants. The formants are numbered from zero, abbreviated F0, the fundamental frequency near the bottom of the image, through the first formant above the fundamental (F1), the second (F2), the third (F3), and so on. The original spectrograph was a specialized machine the size of a desk that printed spectrograms on paper, with each image taking several minutes to produce, a laborious process that limited the scope of the first acoustic phonetic studies. Today, spectrograms can be generated instantly on any personal computer, a technical advance that has greatly expanded the quantity of sociophonetic research. The most popular acoustic analysis program today and the one used in this book’s analysis is Praat (Reference Boersma and WeeninkBoersma and Weenink 2012).

The basic relationship between formant frequencies and vowel quality was demonstrated by Reference Peterson and BarneyPeterson and Barney (1952), leading to many more studies exploring the acoustic characteristics of different classes of speech sounds (Reference Denes and PinsonDenes and Pinson 1963; Reference FryFry 1979); a more recent demonstration appears in Reference Hillenbrand, Getty, Clark and WheelerHillenbrand et al. (1995). Peterson and Barney found that vowel height is reflected in the frequency of the first formant, whereas vowel advancement is reflected in the frequency of the second. More specifically, F1 is inversely correlated with vowel height (the lower the vowel, the higher the formant value), whereas F2 is directly correlated with vowel advancement (the more advanced or farther front the vowel, the higher the formant). Given these relationships, the position of a vowel in the articulatory space illustrated in Figure 2.1 can be precisely and objectively recorded by measuring the frequency of F1 and F2 in the spectrum of its waveform. The third formant is correlated with lip-rounding (higher for unrounded vowels, lower for rounded), which is predictable rather than phonemic in English, but also with the difference between the “liquid” consonants: /l/ (high F3) and /r/ (low F3). Formants higher than F3 appear to play no important role in vowel discrimination.

To demonstrate the articulatory-acoustic relationship between vowel quality and the first two formants, I recorded myself pronouncing the eight words of Figure 2.1 and carried out an acoustic analysis of the recording, the result of which is reproduced here in Figure 2.2. The upper window shows the waveform: a measure of amplitude over time, in which the vowel sounds appear as clusters of black striations, indicating their acoustic prominence. The lower window, which is synchronized with the upper, shows the spectrogram, in which the formants appear as horizontal black bands through each vowel. The vertical scale is frequency in Hertz; the horizontal is time in milliseconds. Overlaid on top of the formants are linear series of dots (red in the original) that represent a more precise analysis of the exact frequency of each formant at each point in time. The table under the spectrogram shows the words from Figure 2.1 with the values of F1, F2, and F3 at a representative point in each vowel (the maximal value of F1, indicating the articulatory target achieved at the point of greatest mouth opening).

Figure 2.2 Spectrogram example: spectrographic analysis of the words from Figure 2.1, spoken by the author

Note in Figure 2.2 how F1 rises as the vowels get lower from see to tie and then falls as the vowels get higher again toward pool, whereas F2 falls constantly from the front vowel of see to the back vowel of pool, demonstrating the relationships first established by Reference Peterson and BarneyPeterson and Barney (1952): high vowels have low F1 values; low vowels have high F1 values; front vowels have high F2 values; and back vowels have low F2 values. Like the articulatory descriptions in Figure 2.1, to which they are systematically related, the acoustic measures in Figure 2.2 can be displayed in a vowel chart, as shown in Figure 2.3. For the two charts to correspond in their orientation, the axial origin of the acoustic chart has to be in the upper right corner, with F1 increasing from top to bottom (high vowels down to low vowels) and F2 increasing from right to left (back vowels to front vowels).

Figure 2.3 Relation of formant values to vowel quality: vowel chart displaying acoustic data from Figure 2.2, corresponding to articulatory vowel chart in Figure 2.1

The relative positions of the vowels of Figure 2.2 are faithfully reproduced in Figure 2.3, but their absolute positions are now pinpointed with much greater accuracy, revealing a few discrepancies between abstract articulatory descriptions and concrete acoustic measurements. My pronunciation of law, for instance, is more mid-back than low-back, while my production of cold is closer to pool than the articulatory labels would suggest. This is exactly the kind of variation in the phonetic values of phonemes that will be studied in this book. The articulatory and acoustic charts do, however, correspond well in the positions of cow and tie, which are closely adjacent. These are phonemic diphthongs, or long vowels with two parts: each has a nucleus (the first, “ah” part) and a glide (the second part, which is “oo” for cow and “ee” for tie, so “cah-oo” and “tah-ee”). Because their nuclei nearly overlap, it is the direction of the glide (to a high-back position for cow, but to a high-front position for tie) that distinguishes these phonemes from one another, enabling the nuclei to occupy the same space without causing the vowels to merge. The glide trajectories can be seen in the spectrogram: the F2 of tie rises toward the value it has in see, whereas that of cow falls toward the value it has in pool.

I have measured only the nuclear quality of these diphthongs, which is what varies most among North American accents, for example in the patterns of Canadian Raising and back vowel fronting that will be discussed in subsequent chapters, though recent research shows that glide quality can also vary by region (Reference ThomasThomas 2000; Reference Fox and JacewiczFox and Jacewicz 2009; Reference Jacewicz, Fox, Stewart Morrison and AssmannJacewicz and Fox 2013). This could be studied by taking a second measurement of each diphthongal vowel, indicating its glide target, or a series of measurements, indicating the entire trajectory of the glide. Spectrograms allow other aspects of vowel production to be measured as well, like duration (the length of the vowel in milliseconds), which can also vary among regional or social groups (Reference Jacewicz, Fox and SalmonsJacewicz, Fox, and Salmons 2007) and play a role in phonemic contrast (Reference Fridland, Kendall and FarringtonFridland, Kendeall, and Farrington 2014). Nevertheless, as discussed in Section 2.5, the approach of this book, like that of Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006) in the Atlas of North American English (ANAE), will be to focus on measurement of F1 and F2 at a single point in the nucleus of each vowel, representing its primary articulatory target: the point the tongue aims at before it changes direction to begin its transition into a following glide or toward the next phoneme.

Figure 2.3 shows that acoustic analysis of vowel quality produces exact measurements that can be subjected to quantitative and statistical analysis, allowing phonetic variation and change to be studied with great precision. This has been amply demonstrated by many previous sociophonetic studies, both of ongoing sound change (Reference LabovLabov 1963, Reference Labov and Labov1980, Reference Labov, Karen and Miller1991; Reference Clarke, Elms and YoussefClarke, Elms, and Youssef 1995) and of regional and social phonetic variation (Reference Labov, Yaeger and SteinerLabov, Yaeger, and Steiner 1972; Reference LabovLabov 1990; Reference HagiwaraHagiwara 1997; Reference ThomasThomas 2001; Reference Clopper and PisoniClopper and Pisoni 2004; Reference Clopper, Pisoni and KennethClopper, Pisoni, and De Jong 2005; Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006; Reference BobergBoberg 2008, Reference Boberg2014; i.a.). Beyond measuring the quality of the vowels of individual words like those in Figure 2.3, the mean position of a vowel phoneme or its allophones for a given speaker can be calculated from all of the individual tokens of that vowel produced across a dataset. This mean can then be compared with the means of other vowels in the same speaker’s system or with the means of the same vowel in other speaker’s systems. For instance, the presence of a phonemic merger can be assessed by carrying out a t-test of the difference between the mean formant values of two adjacent vowels to see whether they are significantly different, thereby representing separate distributions of vowel tokens, or not, thereby indicating that they are a single phoneme. On a larger scale, a mean of the individual mean positions of a given vowel can be calculated across a whole group of speakers and compared with the mean of another group, to study how the groups differ at the phonetic level. Both of these methods underlie much of the analysis in the following chapters. The quantitative and statistical methods used in this book are discussed further in Section 2.6.

2.3 Assembling a Sample of Performances (Actors and Roles)

One of the most important methodological issues in any sociolinguistic or dialectological study is the sample of speakers it is based on, which can influence the results of its analysis and therefore its conclusions: which communities or groups of speakers are to be studied and which members of those groups are to be included in (or excluded from) the dataset. This is no less true in a study of accents in film and television speech. This section will therefore set out the criteria used for selecting a sample of actors and roles for analysis and present the sample that resulted from that selection.

2.3.1 Selection Criteria

As in most studies, the time and other resources available for carrying out the research presented in this book were not unlimited, which prevented its survey of film and television speech from being as exhaustive as one might wish. Given the time required for acoustic analysis in particular, if every famous actor and film or TV show in history were analyzed, not to speak of the less famous ones, or if the analysis were expanded to include other genres of media speech, like newscasts and talk shows, it would have required a large staff working for many years, without any guarantee that its main conclusions would be any different. It was therefore decided to restrict the study to traditional comedies and dramas in film and on prime-time television and to select within those genres a satisfactory but limited sample of performances for analysis. This unfortunately meant excluding many other performances that might have been equally worthy of study.

In this respect it is worth restating here what was said in Section 1.2 of Chapter 1: that the selection of performances for analysis was guided primarily by linguistic criteria. It does not reflect an assessment of the skills of the actors who were either included or excluded, or of the artistic or cultural merit of the films or television series in which they appear. Perhaps it is worth adding, without citing any specific examples, that my own subjective opinion of the individual performances studied here varies widely. A phonetic study of my own favorite films would need to examine the pronunciation of British English, Danish, French, Italian, and Swedish as well as North American English: my vote for the greatest film of all time is not for the often-cited Citizen Kane (1941) but for Babette’s Feast (1987), the Danish film by Gabriel Axel, whose dialogue combines Danish, French, and Swedish. By that standard, I would judge the artistic value of some of the films and television shows analyzed in this book to be comparatively slight, at best, while others do rank among my personal favorites. Readers should therefore dismiss any notion that I see all of the performances studied here as uniformly meritorious in any artistic or cultural sense, or that comments about sociolinguistic issues like accent authenticity should be taken to apply to other aspects of the quality of a performance. This book is concerned with linguistic analysis, not with dramatic criticism: regardless of their cultural or artistic merit, all the performances studied here have something to tell us about variation and change in the pronunciation of North American English, which is the reason for their selection.

Unlike in political polling or sociological surveys, in which only a random sample of the population under study allows the researchers to extrapolate the patterns they observe to the whole population, sociolinguists and dialectologists, who are interested more in differences between groups than in describing entire populations, have traditionally relied on judgment samples: participants are recruited based on their age, sex, social class, regional background, and other attributes, to obtain a subsample of each social or regional group that is satisfactory for quantitative and statistical analysis (Reference ChambersChambers 1995: 38–41). A similar approach will be taken here, but with the added factor of celebrity, which distinguishes this study from traditional sociolinguistic research. Given the possibility, discussed in Section 1.5 of Chapter 1, that film and television speech may serve as a model for style shifting and change from above in the general population and as an index of historical shifts in both overt and covert sociolinguistic prestige, it was important to include potentially influential performances in the sample: those that, according to a general knowledge of film and television history, involve the most famous actors of each decade in their most famous roles. Because most of the actors in this category are European Americans from the New York City, Midwestern, and Western regions, however, the sample was expanded to include some less prominent actors representing the other regional and ethnic backgrounds discussed in Subsection 1.3.4 of Chapter 1. Actors with a clear childhood association with a single dialect region, extending from early childhood through graduation from high school, were generally preferred over those with mixed or unknown regional backgrounds, but actors with mixed backgrounds were not excluded, since they represent an important part of the population as well, given modern levels of migration and mobility.

Because the main focus of the book’s analysis is on change in real time, it was also important to achieve a satisfactory sample of films or television shows in each decade. The time period covered by the analysis begins in the late 1920s, with the advent of talking pictures or “talkies”: films with recorded dialogue synchronized with the images. Prior to that point, going back to the invention of moving pictures around 1895, films had been silent, though as the medium evolved, intertitles with short stretches of printed dialogue were added between scenes, increasing the film’s narrative potential, and accompanying music was supplied by live performers in the theater, often on a theater organ that could produce a wide range of sound qualities and effects. The first sound film, the musical The Jazz Singer, appeared in 1927, but includes only short stretches of dialogue. The first film with recorded dialogue throughout was Lights of New York, which appeared in 1928; the first British sound film, Alfred Hitchcock’s Blackmail, came a year later in 1929. As with most new technologies, early sound recording was primitive and plagued by technical problems, not to mention the challenges faced by silent film actors who now had to speak their lines, a transition satirized to great comic effect two decades later in the musical Singin’ in the Rain (1952). The transformation was rapid, however: by 1930, most American cinemas had been refitted for sound and Hollywood stopped making silent films (Reference Dibbets and Nowell-SmithDibbets 1996: 212). The first decade of the sound era in film was therefore the 1930s. Early television technology also appeared at this time, but regular network television broadcasts to large numbers of households began in the late 1940s in the United States and the early 1950s in Canada. The advent of television, together with the greater availability and popularity of more recent films and the exclusion of some regional and social groups of actors from earlier films, means the sample of performances studied here is weighted more heavily toward the later decades of the twentieth century than the earlier period, which allows contemporary regional and social variation to be studied as well as change, but an effort was made to ensure at least a minimally satisfactory subsample from each decade.

Beyond the general issues of celebrity status, distribution across the decades, and regional and ethnic representation, the selection of individual actors and characters was constrained by several other factors. Most importantly, given the focus of the book, all the actors had to be native speakers of North American English. This restriction excluded both nonnative speakers of English, like Ingrid Bergman, Penélope Cruz, Julie Delpy, Marlene Dietrich, Greta Garbo, and Hedy Lamarr, and native speakers of non–North American varieties of English, like Cate Blanchett, Richard Burton, Joan Collins, Ralph Fiennes, Errol Flynn, Cary Grant, Hugh Grant, Audrey Hepburn, Anthony Hopkins, Nicole Kidman, Hugh Laurie, Vivien Leigh, Liam Neeson, Maureen O’Hara, Elizabeth Taylor, and Kate Winslet, among many others. While it would be interesting to study how these non–North American actors adapt their speech to roles on the North American screen and how the results of that adaptation differ from native North American screen accents, that question is beyond the scope of this book.

Another important question was which performance to analyze for each actor, given that many of the actors studied here have appeared in scores of films or hundreds of television episodes spanning several decades, of which several could be judged equivalently notable or informative. In order to take a consistent approach to establishing a series of data points for the analysis of change over time, where two or more performances were of approximately equal stature and analytical value, preference was given wherever possible to the one nearest the beginning of an actor’s career, marking the start of that actor’s potential influence on the general public and role in establishing sociolinguistic norms. As older actors tend to represent the speech standards of their youth, including too many late-career performances might distort the analysis of change over time. For television shows, the first five episodes of the first season were chosen for analysis wherever possible, for the same reason. Analyzing five episodes was intended to provide approximately the same amount of dialogue as could be obtained from a typical ninety-minute or two-hour film (though in most cases it turned out to be more, as television shows focus more exclusively on dialogue than films, which include more nonverbal content.

As stated in Section 1.6 of Chapter 1, performances were selected to represent the usual on-screen accent of each actor, rather than departures from that norm to suit particular roles. This was usually not difficult, as the on-screen speech of most major stars tends to be highly consistent across most of the roles they play, a part of their on-screen persona. Where regional or ethnic accent performances were selected, these were restricted to authentic performances by actors with a native familiarity with the accent in question, thereby excluding nonnative accent imitations. “Straight” performances of regional or social accents were also preferred over broad comic parodies or caricatures, such as Sean Penn’s perpetually stoned Southern California surfer Jeff Spicoli in Fast Times at Ridgemont High (1982); Rick Moranis and Dave Thomas’s McKenzie Brothers, the Canadian “hosers” in Strange Brew (1983); Fran Drescher’s New York Jewish nanny, Fran Fine, in The Nanny (1993–1999); or David Lawrence and Paul Spence’s Calgary metalheads Terry and Dean in FUBAR (2002). These are brilliant comic performances, based on native knowledge of the accents in question, and are also interesting reflections of dialect ideology, but for the reasons discussed in Chapter 1 they are not appropriate examples of regional speech for inclusion in this study. Exceptions to this rule were made only where necessary for coverage of particular regional or social accents: Cheech Marin’s Los Angeles Chicano stoner in Up in Smoke (1978) and his Nova Scotia parallel, Robb Wells’s Ricky in Trailer Park Boys (2001–), could also be considered caricatures, but were included as well-known examples of Latino and Maritime Canadian English, which are both based on native knowledge. Such exceptions were not necessary in representing other accents, like those of Southern California, the Midwest, New York City, or central and Western Canada.

Another important constraint on performance selection was quantity of speech: the role had to provide enough speech to support a reliable phonetic analysis of the vowel system. This typically requires a minimum of around 300 fully stressed tokens, distributed across all or most of the word classes listed in Section 2.4. In smaller datasets, some important word classes are often underrepresented, which can distort the means and lead to inaccurate analyses, or not represented at all. To produce an adequate dataset, a performance requires substantial stretches of dialogue in several scenes of the film or TV show. This unfortunately excluded many valuable performances by actors in supporting roles, like those of Hattie McDaniel in Gone with the Wind (1939) or Shirley Temple in Fort Apache (1948), as well as many top-billed performances in primetime soap operas, for example that of Linda Evans in Dynasty (1981–1989), because the soap opera format divides its episodes among several simultaneous story lines involving different characters in large ensemble casts. It even excluded some leading characters in conventional films, like Jane Russell in The Outlaw (1943), who has a limited speaking role despite her star billing. Especially problematic in this respect, however, are men in action, adventure, and war films, like Steve McQueen in Bullitt (1968), Clint Eastwood in Dirty Harry (1971), Mark Hamill in Star Wars (1977), Bruce Willis in Die Hard (1988), or Tom Hanks in Saving Private Ryan (1998), all of which had to be set aside. Films like these feature relatively small amounts of dialogue and a lot of shouting and background noise, both of which create problems for acoustic analysis.

Many of the greatest male stars, in fact, tend to play traditionally masculine “strong, silent types”: men of action but few words. The most notable example of this type is surely John Wayne: his iconic career-launching role as the Ringo Kid in Stagecoach (1939) provided too little speech for analysis, partly because of the film’s interest in examining the stories of half a dozen passengers brought together on the coach, of which Ringo is only one. Wayne’s taciturnity remained a challenge even in the performance that was selected: his later role as Ethan Edwards in The Searchers (1956). Another classic example of a man of few words is Gary Cooper’s Marshall Will Kane in High Noon (1952): finding enough tokens of his speech also proved difficult. The sample is therefore somewhat biased toward what are stereotypically considered women’s films, colloquially referred to as “chick flicks”: romantic comedies and dramas that focus mainly on dialogue, providing hundreds of good tokens for analysis. Examples analyzed here include The Women (1939), The Philadelphia Story (1940), All About Eve (1950), Father of the Bride (1950), Imitation of Life (1959), Pillow Talk (1959), Sixteen Candles (1984), When Harry Met Sally (1989), Pretty Woman (1990), As Good as It Gets (1997), Titanic (1997), There’s Something About Mary (1998), Legally Blonde (2001), Mean Girls (2004), and Easy A (2010). Because Saving Private Ryan had too little analyzable dialogue, for instance, Tom Hanks had to be analyzed in Sleepless in Seattle (1993). Mid-century film noir also tends to focus on dialogue, with long stretches of conversation involving both female and male characters; examples analyzed here include Casablanca (1942), Double Indemnity (1944), Laura (1944), The Big Sleep (1946), and Gilda (1946).

With female characters, the problem tends to be not so much the quantity as the quality of their speech, particularly relating to pitch and resonance: some actresses speak with generally high pitch throughout a performance; some produce very high pitch or wide and rapid changes in pitch during the articulation of a single vowel, usually in connection with emotions like anger or excitement; and some lapse into a “breathy” or half-whispered voice, either in particular phrases or scenes that involve intimate conversations, or, in the case of actresses like Farrah Fawcett, Grace Kelly, Marilyn Monroe, and Gene Tierney, as a general feature of their on-screen speech and persona. All these traits create challenges for the accuracy of acoustic analysis and led in some cases to problematic tokens being excluded from the dataset, but it was fortunately not necessary to exclude any actress from analysis altogether on that basis.

In summary, then, the criteria used for selecting performances were:
  • Famous actors in famous roles from each decade;

  • Adequate representation of regional and ethnic accents;

  • Actor’s childhood spent in a single dialect area;

  • Native speakers of North American English;

  • Early-career rather than late-career performances;

  • Authentic rather than imitated regional accents;

  • Straight performances of regional or social accents rather than caricatures; and

  • Roles with enough speech for reliable phonetic analysis.

2.3.2 The Sample

The forgoing criteria produced a sample of 120 performances in 83 films and 60 performances in 37 television shows, making a total of 180 performances for analysis. These are presented in Table 2.1, which lists the actors, their birth years, and dialect regions or ethnic groups; the films or television shows they appear in; the characters they play; and the years of the performances. The entries are sorted here by performance year, but lists sorted by actor and by film or television show are provided in Appendix A and B. The regions and ethnic groups are those discussed in Subsection 1.3.4 of Chapter 1. The last column of Table 2.1 indicates the analysis group to which each actor belongs: those in the R/E group have distinctly regional or ethnic speech patterns for their period and are included only in the regional analyses in Chapters 5 and 6 and the ethnic analyses in Chapter 7; all others are in the G group, on which the analysis of General North American English is based, as discussed in Section 2.7.

Table 2.1 Actors and performances analyzed, with birth year (BY), native dialect region, performance year, and analysis group, sorted by performance year

ActorBY Region or ethnic groupFilm/showCharacterYearGroup
Robinson, Edward G.1893NYCLittle Caesar"Rico” Bandello / “Little Caesar”1931G
Harlow, Jean1911MixedPlatinum BlondeAnn Schuyler1931G
Marx, Groucho1890NYCDuck SoupRufus T. Firefly1933G
West, Mae1893NYCI’m No AngelTira1933G
Armstrong, Robert1890MixedKing KongCarl Denham1933G
Colbert, Claudette1903NYCIt Happened One NightEllie Andrews1934G
Gable, Clark1901MidlandIt Happened One NightPeter Warne1934G
Barrymore, John1882Mid-Atl.Twentieth CenturyOscar “O.J.” Jaffe1934G
Lombard, Carole1908WestTwentieth Century
  • Lily Garland/

  • Mildred Plotka

Astaire, Fred1899MixedTop HatJerry Travers1935G
Rogers, Ginger1911MixedTop HatDale Tremont1935G
Garland, Judy1922WestThe Wizard of OzDorothy Gale1939G
Crawford, Joan1904MixedThe WomenCrystal Allen1939G
Russell, Rosalind1907NortheastThe WomenSylvia Fowler1939G
Shearer, Norma1902CanadaThe WomenMary Haines1939G
Fields, W. C.1880Mid-Atl.The Bank DickEgbert Sousé1940G
Hepburn, Katharine1907NortheastThe Philadelphia StoryTracy Samantha Lord1940G
Welles, Orson1915Inland N.Citizen KaneCharles Foster Kane1941G
Lake, Veronica1922NYCSullivan’s TravelsThe Girl1941G
McCrea, Joel1905WestSullivan’s TravelsJohn L. Sullivan1941G
Bogart, Humphrey1899NYCCasablancaRick Blaine1942G
Crosby, Bing1903WestHoliday InnJim Hardy1942G
Cagney, James1899NYCYankee Doodle DandyGeorge M. Cohan1942G
MacMurray, Fred1908Inland N.Double IndemnityWalter Neff1944G
Stanwyck, Barbara1907NYCDouble IndemnityPhyllis Dietrichson1944G
Tierney, Gene1920NYCLauraLaura Hunt1944G
Bacall, Lauren1924NYCThe Big SleepVivian Rutledge1946G
Hayworth, Rita1918MixedGildaGilda Mundson Farrell1946G
Barrymore, Lionel1878Mid-Atl.It’s a Wonderful LifeMr. Henry F. Potter1946G
Reed, Donna1921MidlandIt’s a Wonderful LifeMary Hatch Bailey1946G
Stewart, James1908MidlandIt’s a Wonderful LifeGeorge Bailey1946G
Fonda, Henry1905MidlandMy Darling ClementineWyatt Earp1946G
Davis, Bette1908NortheastAll About EveMargo Channing1950G
Tracy, Spencer1900Inland N.Father of the BrideStanley T. Banks1950G
Holden, William1918WestSunset BoulevardJoe Gillis1950G
Swanson, Gloria1899MixedSunset BoulevardNorma Desmond1950G
Ball, Lucille1911MixedI Love LucyLucy Ricardo1951G
Cooper, Gary1901WestHigh NoonWill Kane1952G
Kelly, Gene1912MidlandSingin’ in the RainDon Lockwood1952G
Clift, Montgomery1920MixedFrom Here to EternityPte. Robert E. Lee “Prew” Prewitt1953G
Grable, Betty1916MixedHow to Marry a MillionaireLoco Dempsey1953G
Wyatt, Jane1910NYCFather Knows BestMargaret Anderson1954G
Young, Robert1907MixedFather Knows BestJim Anderson, Sr.1954G
Brando, Marlon1924MixedOn the WaterfrontTerry Malloy1954G
Kelly, Grace1929Mid-Atl.Rear WindowLisa Carol Fremont1954G
Ritter, Thelma1902NYCRear WindowStella1954G
Gleason, Jackie1916NYCThe HoneymoonersRalph Kramden1955G
Dean, James1931MixedRebel Without a CauseJim Stark1955G
Monroe, Marilyn1926WestThe Seven Year ItchThe Girl1955G
Peck, Gregory1916WestThe Man in the Gray Flannel SuitTom Rath1956G
Wayne, John1907MixedThe SearchersEthan Edwards1956G
Billingsley, Barbara1915WestLeave It to BeaverJune Cleaver1957G
Curtis, Tony1925NYCThe Sweet Smell of SuccessSidney Falco1957G
Lancaster, Burt1913NYCThe Sweet Smell of SuccessJ. J. Hunsecker1957G
Moore, Juanita1914Afr. Am.Imitation of LifeAnnie Johnson1959R/E
Turner, Lana1921WestImitation of LifeLora Meredith1959G
Day, Doris1922MidlandPillow TalkJan Morrow1959G
Hudson, Rock1925Inland N.Pillow TalkBrad Allen1959G
Griffith, Andy1926SouthThe Andy Griffith ShowSheriff Andy Taylor1960R/E
Lemmon, Jack1925NortheastThe ApartmentC. C. “Bud” Baxter1960G
MacLaine, Shirley1934SouthThe ApartmentFran Kubelik1960G
Van Dyke, Dick1925MidlandThe Dick Van Dyke ShowRob Petrie1961G
Newman, Paul1925Inland N.The HustlerEddie Felson1961G
Montgomery, Elizabeth1933MixedBewitchedSamantha Stephens1964G
Eden, Barbara1931WestI Dream of JeannieJeannie1965G
Shatner, William1931CanadaStar TrekCapt. James T. Kirk1966G
Fonda, Jane1937NYCBarefoot in the ParkCorie Bratter1967G
Redford, Robert1936WestBarefoot in the ParkPaul Bratter1967G
Bancroft, Anne1931NYCThe GraduateMrs. Robinson1967R/E
Hoffman, Dustin1937WestThe GraduateBenjamin Braddock1967G
Asner, Ed1929MidlandThe Mary Tyler Moore ShowLou Grant1970G
Moore, Mary Tyler1936MixedThe Mary Tyler Moore ShowMary Richards1970G
O’Connor, Carroll1924NYCAll in the FamilyArchie Bunker1971R/E
Reiner, Rob1947NYCAll in the FamilyMichael Stivic1971R/E
Stapleton, Jean1923NYCAll in the FamilyEdith Bunker1971R/E
Roundtree, Richard1942Afr. Am.ShaftJohn Shaft1971R/E
Newhart, Bob1929Inland N.The Bob Newhart ShowDr. Robert (“Bob”) Hartley1972G
Alda, Alan1936NYCM*A*S*HCapt. “Hawkeye” Pierce1972G
Pacino, Al1940NYCThe Godfather Part IIMichael Corleone1974R/E
Amos, John1939Afr. Am.Good TimesJames Evans1974R/E
Rolle, Esther1920Afr. Am.Good TimesFlorida Evans1974R/E
Walker, Jimmie1947Afr. Am.Good TimesJ. J. Evans1974R/E
Linden, Hal1931NYCBarney MillerCapt. Barney Miller1975R/E
Hemsley, Sherman1938Afr. Am.The JeffersonsGeorge Jefferson1975R/E
Sanford, Isabel1917Afr. Am.The JeffersonsLouise Jefferson1975R/E
Kaplan, Gabe1945NYCWelcome Back, KotterGabe Kotter1975R/E
Fawcett, Farrah1947SouthCharlie’s AngelsJill Munroe1976G
Stallone, Sylvester1946MixedRockyRocky Balboa1976G
Allen, Woody1935NYCAnnie HallAlvy “Max” Singer1977R/E
Keaton, Diane1946WestAnnie HallAnnie Hall1977G
Travolta, John1954NYCSaturday Night FeverTony Manero1977R/E
Field, Sally1946WestSmokey and the BanditCarrie (“Frog”)1977G
Reynolds, Burt1936MixedSmokey and the BanditBo Darville (“Bandit”)1977G
Ritter, John1948WestThree’s CompanyJack Tripper1977G
Chase, Chevy1943NortheastFoul PlayLt. Tony Carlson1978G
Hawn, Goldie1945Mid-Atl.Foul PlayGloria Mundy1978G
Hirsch, Judd1935NYCTaxiAlex Reiger1978R/E
Marin, Cheech1946LatinoUp in SmokePedro De Pacas1978R/E
Streep, Meryl1949Mid-Atl.Kramer vs. KramerJoanna (Stern) Kramer1979G
Jones, Tommy Lee1946SouthCoal Miner’s DaughterDoolittle Lynn1980R/E
Spacek, Sissy1949SouthCoal Miner’s DaughterLoretta Lynn1980R/E
Manetti, Larry1947Inland N.Magnum P.I.Rick Wright1980R/E
Selleck, Tom1945WestMagnum P.I.Thomas Magnum1980G
De Niro, Robert1943NYCRaging BullJake LaMotta1980R/E
Pesci, Joe1943NYCRaging BullJoey LaMotta1980R/E
Ford, Harrison1942Inland N.Indiana Jones and the Raiders of the Lost ArkIndiana Jones1981G
Nabors, Jim1930SouthThe Best Little Whorehouse in TexasDeputy Fred1982R/E
Parton, Dolly1946SouthThe Best Little Whorehouse in TexasMona Stangley1982R/E
Hagman, Larry1931SouthDallasJ. R. Ewing1982R/E
Howard, Susan1944SouthDallasDonna Culver Krebbs1982R/E
Murphy, Eddie1961Afr. Am.Beverly Hills CopAxel Foley1984R/E
Cosby, Bill1937Afr. Am.The Cosby ShowDr. Cliff Huxtable1984R/E
Rashad, Phylicia1948Afr. Am.The Cosby ShowClair Hanks Huxtable1984R/E
Ringwald, Molly1968WestSixteen CandlesSamantha “Sam” Baker1984G
Broderick, Matthew1962NYCFerris Bueller’s Day OffFerris Bueller1986G
Williams, Robin1951MixedDead Poets SocietyJohn Keating1989G
Aiello, Danny1933NYCDo the Right ThingSal1989R/E
Jackson, Samuel L.1948Afr. Am.Do the Right ThingMister Señor Love Daddy1989R/E
Lee, Spike1957Afr. Am.Do the Right ThingMookie1989R/E
Perez, Rosie1964LatinaDo the Right ThingTina1989R/E
Turturro, John1957NYCDo the Right ThingPino1989R/E
Alexander, Jason1959NYCSeinfeldGeorge Costanza1989R/E
Seinfeld, Jerry1954NYCSeinfeldJerry Seinfeld1989G
Crystal, Billy1948NYCWhen Harry Met SallyHarry Burns1989R/E
Ryan, Meg1961NortheastWhen Harry Met SallySally Albright1989G
Gere, Richard1949MixedPretty WomanEdward Lewis1990G
Roberts, Julia1967SouthPretty WomanVivian Ward1990G
Fishburne, Laurence1961Afr. Am.Boyz n the HoodJason “Furious” Styles, Jr.1991R/E
Allen, Tim1953MixedHome ImprovementTim Taylor1991G
Richardson, Patricia1951SouthHome ImprovementJill Taylor1991G
Hanks, Tom1956WestSleepless in SeattleSam Baldwin1993G
Aniston, Jennifer1969NYCFriendsRachel Green1994G
Cox, Courteney1964SouthFriendsMonica Geller1994G
Freeman, Morgan1937Afr. Am.The Shawshank RedemptionEllis Boyd “Red” Redding1994R/E
Hawke, Ethan1970MixedBefore SunriseJesse1995G
Silverstone, Alicia1976WestCluelessCher Horowitz1995G
Heaton, Patricia1958Inland N.Everybody Loves RaymondDebra Barone1996G
Romano, Ray1957NYCEverybody Loves RaymondRaymond “Ray” Barone1996R/E
Hart, Melissa Joan1976NYCSabrina the Teenage WitchSabrina Spellman1996G
Hunt, Helen1963MixedAs Good as It GetsCarol Connelly1997G
Nicholson, Jack1937Mid-Atl.As Good as It GetsMelvin Udall1997G
Gellar, Sarah Michelle1977NYCBuffy the Vampire SlayerBuffy Summers1997G
DiCaprio, Leonardo1974WestTitanicJack Dawson1997G
Bridges, Jeff1949WestThe Big LebowskiJeffrey “The Dude” Lebowski1998G
Goodman, John1952MidlandThe Big LebowskiWalter Sobchak1998G
Parker, Sarah Jessica1965MixedSex and the CityCarrie Bradshaw1998G
Adams, Evan1966Indig.Smoke SignalsThomas Builds-the-Fire1998R/E
Beach, Adam1972Indig.Smoke SignalsVictor Joseph1998R/E
Bedard, Irene1967Indig.Smoke SignalsSuzy Song1998R/E
Diaz, Cameron1972WestThere’s Something About Mary
  • Mary Jensen/

  • Matthews

Stiller, Ben1965NYCThere’s Something About MaryTed Stroehmann1998G
Hayes, Sean1970Inland N.Will & GraceJack McFarland1998G
McCormack, Eric1963CanadaWill & GraceWill Truman1998G
Messing, Debra1968NortheastWill & GraceGrace Adler1998G
Borchardt, Mark1966Inland N.American MovieMark Borchardt1999R/E
Witherspoon, Reese1976SouthLegally BlondeElle Woods2001G
Tremblay, John Paul1968CanadaTrailer Park BoysJulian2001G
Wells, Robb1971CanadaTrailer Park BoysRicky2001G
Damon, Matt1970NortheastThe Bourne IdentityDavid Webb/ Jason Bourne2002G
Johansson, Scarlett1984NYCLost in TranslationCharlotte2003G
Murray, Bill1950Inland N.Lost in TranslationBob Harris2003G
Butt, Brent1966CanadaCorner GasBrent Leroy2004G
Miller, Gabrielle1973CanadaCorner GasLacey Burrows2004G
Peterson, Eric1946CanadaCorner GasOscar Leroy2004G
Cho, John1972Asian N. Am.Harold & Kumar Go to White CastleHarold Lee2004R/E
Penn, Kal1977Asian N. Am.Harold & Kumar Go to White CastleKumar Patel2004R/E
Lohan, Lindsay1986NYCMean GirlsCady Heron2004G
Cuoco, Kaley1985WestThe Big Bang TheoryPenny2007G
Parsons, Jim1973SouthThe Big Bang TheorySheldon Cooper2007G
Clooney, George1961MidlandUp in the AirRyan Bingham2009G
Kendrick, Anna1985NortheastUp in the AirNatalie Keener2009G
Stone, Emma1988WestEasy AOlive Penderghast2010G
Hawco, Allan1977CanadaRepublic of DoyleJake Doyle2010R/E
Pellerin, Krystin1983CanadaRepublic of DoyleLeslie Bennett2010R/E
Dee, Gerry1968CanadaMr. DGerry Duncan2012G
Hammersley, Lauren1971CanadaMr. DLisa Mason2012G
Garner, Jennifer1972SouthDallas Buyers ClubDr. Eve Saks2013G
McConaughey, Matthew1969SouthDallas Buyers ClubRon Woodroof2013R/E
Levy, Dan1983CanadaSchitt’s CreekDavid Rose2015G
Murphy, Annie1986CanadaSchitt’s CreekAlexis Rose2015G

The performances in the sample cover the period from 1931 to 2015, a span of eighty-four years; the earliest television performance is from 1951 (I Love Lucy). Eleven of the earliest actors were born in the last decades of the nineteenth century, with the very earliest being the Barrymore brothers Lionel and John, born in 1878 and 1882 respectively, and W. C. Fields, born in 1880; all Philadelphians, as it happens. Groucho Marx (born 1890) and Mae West (1893) give the earliest evidence of New York City English, while Robert Armstrong (1890) is the earliest Westerner (he is classified as mixed because he spent his early childhood in Michigan but moved to Seattle at the beginning of his teens). The most recent birth year is that of fellow Westerner Emma Stone, born in Arizona in 1988, almost 100 years later. The age of the actors in their performances is easily calculated, give or take a year, by subtracting their birth years from their performance years. This reveals the average age to be thirty-five, with a standard deviation of 9.4, so most were between approximately twenty-five and forty-five years old when they appeared in the performances studied here. The youngest are Molly Ringwald, who was, appropriately, sixteen in Sixteen Candles, and Judy Garland, who was seventeen when she made The Wizard of Oz. The oldest is Lionel Barrymore, who was sixty-eight in It’s a Wonderful Life, followed by W. C. Fields and Jack Nicholson, both sixty in their performances here.

2.3.3 Division into Early and Later Periods

A summary of the sample presented in Table 2.1 is provided in Table 2.2, which shows the distribution of performances by region or ethnic group, period, and sex. The sample is divided into two periods, which will be the basis of real-time comparisons in some of the analyses of change in the following chapters: an early period from the 1930s to the mid-1960s and a later period from the mid-1960s to the present. The year 1965 does divide the historical period covered by this study approximately in half but was chosen as the basis for division for several more meaningful reasons. A preliminary analysis of change in the film data studied here, published in Reference BobergBoberg (2018), found that the mid-1960s was the point when the new, western-based standard of North American film and television speech began to consolidate its ascendancy, replacing the old New York–based standard. In popular speech, this transition is reflected in the changes in progress that Reference LabovLabov (1966) reported in New York City, which were connected with the declining prestige of local speech, in particular the reinsertion of /r/ and a retreat from the raising of /æh/ and /oh/ (bath and thought). This shift in dialect ideology was also shown to coincide with the point when the population of California surpassed that of New York State, marking a dramatic westward shift of population, economic development, and cultural influence following World War II (Reference BobergBoberg 2018: 177).

Table 2.2 Distribution of performances studied, by actor’s region or ethnic group, performance period, and actor’s sex

Region or ethnic group1930s–19641965–presentTotal
Inland No.05516711112
Afr. Am.1013101341014
Asian N. Am.000022022

Beyond their sociolinguistic significance, the mid-1960s are seen as a watershed in many other aspects of North American history and culture. In demographic terms, according to the U.S. Census Bureau (Reference Colby and OrtmanColby and Ortman 2014: 2), they mark the division between the birth years of the post–World War II Baby Boom generation (1946–1964) and the generation that followed it, which has received various labels in popular culture, including Generation X (Reference CouplandCoupland 1991) and the 13th Generation (Reference Howe and StraussHowe and Strauss 1993). Reference MargolisMargolis (1999) calls 1964 “the last innocent year,” arguing that it marked a turning point between the prosperity, optimism, national self-confidence, and broad social and cultural consensus of postwar America and the various cultural, political, and social changes that characterized the late twentieth century. A similar view of recent Canadian history is advanced by Reference BertonBerton (1997), who calls 1967, the year Montreal hosted the World’s Fair in honor of Canada’s centennial, “the last good year,” before the country underwent societal changes parallel to those occurring south of the border.

The mid-1960s cultural divide is also reflected in the films and television shows studied here. Representative of the period prior to 1965, for example, might be the idealized family life and traditional gender roles of Father Knows Best (1954–1960) and Leave It to Beaver (1957–1963), the grace and charm of Fred Astaire, Ginger Rogers, Bing Crosby, and Gene Kelly, the patriotism of Yankee Doodle Dandy (1942), or the self-assured heroism of Humphrey Bogart and John Wayne. After 1965 these images give way to the cultural alienation and generational conflict of The Graduate (1967), the urban decay of Barney Miller (1975–1982), the narcissistic anxiety of Woody Allen and Diane Keaton’s characters in Annie Hall (1977), the sexual promiscuity of Three’s Company (1977–1984), the illegal drug culture of Up in Smoke (1978), the acrimonious divorce and child custody battle of Kramer vs. Kramer (1979), and the race riot of Do the Right Thing (1989).

Like all attempts to reduce what are in fact complex and multifaceted historical trends to stark contrasts and generalizations, choosing any year as a historical dividing point is a simplification and the broader issues entailed in these cultural changes are well beyond the scope of this book. People disagree, moreover, on whether the period that followed the mid-1960s represents a decline of North American civilization, as implied by some of the historical analyses and cinematic themes just cited, or progress toward the “Great Society” envisioned by President Lyndon B. Johnson in 1964, in sympathy with many civil rights leaders, peace activists, feminists, and other critics of pre-1960s American culture. Most people acknowledge, however, that the mid-1960s saw a major cultural shift in North American society, for better or worse. The general cultural and historical importance of the mid-1960s, together with the sociolinguistic developments observed by Reference LabovLabov (1966) and Reference BobergBoberg (2018), will therefore serve as the justification for using 1965 as the division between the two main periods in the book’s analysis of change over real time in North American film and television speech. The early, pre-1965 period comprises the first three decades of sound film, in which many actors from outside the New York City region adopted eastern pronunciation features, particularly the vocalization of /r/. The later period comprises the late twentieth and early twenty-first centuries, during which the pattern of convergence was reversed, with actors from the New York City region suppressing local features in favor of a western-based standard.

As previously discussed, the overall size of the sample was constrained by the time required for phonetic analysis and the most important criterion in selecting performances for analysis was their historical importance and popular renown. As a result, the distribution of performances among regions and ethnic groups, periods, and sexes shown in Table 2.2 is not as balanced as it might be with a different approach to sampling, but the sample does include at least some actors of each sex from the most important regions in each period. The table indicates that 36 percent of the performances studied here are from the early period and 64 percent from the late. The overall proportion of sexes is 41 percent female and 59 percent male, though the female proportion rises to 47 percent of the “G” or General North American English group, on which the analysis of change over time in Chapter 3 and sex differences in Chapter 4 will be based. Of the regional groups, those with the largest representation in the sample are New York City, at 23 percent of the actors, the West (14 percent), Mixed (13 percent), and the South (9 percent). The largest ethnic minority group is African American (8 percent, rising to 11 percent of the late-period sample).

2.4 Phonemes and Word Classes to Be Analyzed and Their Notation

Any study of linguistic variation and change must begin by establishing the set of linguistic variables to be analyzed. At the most basic level, the linguistic variables in this study are the phonetic qualities of all the vowel phonemes of English, some of which are involved in the vocalic chain shifts discussed in Subsections 1.3.3 and 2.1.3, plus the vocalization versus constriction of /r/. In addition to these basic variables, the following chapters will analyze several more complex variables, including variables of phonemic distribution and inventory (conditioned and unconditioned mergers and splits, discussed in Subsection 2.1.2); “derived measures” that reflect quantitative relationships between vowels, such as the phonetic distance in Hertz between two vowels or allophones; and “indices” that capture overall historical and regional patterns by aggregating several individual patterns that cooccur, thereby producing more powerful diagnostics of participation in ongoing change or regional variation. The basic variables are set out in this section; the more complex variables will be explained as they are introduced in the subsequent analysis.

There have been many approaches to the phonemic transcription of English vowels. A common approach today, adopted in many standard phonetics textbooks like Reference Ladefoged and JohnsonLadefoged and Johnson (2015), is to use symbols from the IPA, placed between phonemic slashes, that indicate the approximate phonetic quality of the vowels in some reference dialect, typically General American English or Standard Southern British Received Pronunciation. There are four disadvantages to this approach, however: it privileges one accent of English over others whose vowels often have very different qualities from those implied by the chosen symbols; it relies on a large number of non-Roman characters that are unknown to nonspecialist readers and sometimes difficult to manipulate in editing or printing a document; it is an ahistorical view that ignores the development of English vowels from earlier sources; and, most importantly, it confuses phonemic or broad transcription with phonetic or narrow transcription, using phonetic symbols that have concrete articulatory values to designate abstract phonemic categories. This book will therefore follow the American structuralist notational tradition initiated by Reference Trager and BlochTrager and Bloch (1941) and Reference Trager and SmithTrager and Smith (1951), which was later adopted by Labov in all his work and in the ANAE (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006), the research tradition with which this book is most closely associated.

The Trager/Labov system employs binary notation. Short vowels (those that cannot appear without a following consonant under primary stress) have a single-character symbol, reflecting their monomoraic (single weight unit) value in syllable structure, which indicates their approximate nuclear quality. Long vowels (those that can appear without a following consonant under primary stress) have two-character symbols, reflecting their bimoraic weight. In English phonology, short vowels are often called lax and long vowels tense, terms that are difficult to define precisely in phonetic terms but will be used occasionally in the following chapters, as they are in Reference Labov, Yaeger and SteinerLabov (1972: 73–74; Reference Labov and Eckert1991: 4–5) and Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 173). The first character of long vowel symbols indicates their nuclear quality, which is associated with one of the short vowels, while the second character indicates the direction of their postnuclear glides. Glide directions, which divide the long vowels into subsets, are notated with three characters that have purely phonemic, not phonetic value, derived from English orthographic tradition: y indicates a front up-glide toward the high-front region of the vowel space, as in tie or day; w indicates a back-up-glide toward the high-back region of the vowel space, as in cow or goal; and h indicates a long vowel that is either monophthongal, with no glide (as in the interjection ah!), or has an in-glide toward the mid-central region of the vowel space (as in idea), both of which are possible pronunciations of a word like law (depending on the dialect).

For example, the short-o of top, derived from the historical short /o/ of Middle English, is notated as /o/, even though its phonetic value is low and unrounded in many modern dialects, which leads many phoneticians who speak those dialects to notate it instead with IPA symbols based on the letter <a>. In the binary system, /o/ is also the nucleus of three long vowels: /ow/, the back-up-gliding long-o of toe; /oy/, the front-up-gliding vowel of toy; and /oh/, the monophthongal or in-gliding vowel of taught. As this example illustrates, the binary system uses glide symbols both for phonemic diphthongs like the vowels of tie, toy, and cow, in which the glide helps to distinguish the vowel from other phonemes, and for phonetic diphthongs like the vowels of see, day, goal, and pool, in which the glide does not serve a contrastive role (in fact these vowels are monophthongal in some dialects). This practice makes explicit the internal structure of the English vowel system, in which vowels tend to behave as subsets, with the members of each subset showing parallel developments in chain shifting and regional variation. For example, the Great Vowel Shift of Early Modern English and the more recent Southern Vowel Shift affect the long front-up-gliding vowels, not as individual phonemes but as a unified set; the same is true of the fronting of back-up-gliding vowels in the Southern Shift.

A third approach to notation, which is now standard in British sociodialectology and has recently gained broad acceptance in North American research as well, is the use of keywords instead of phonemic symbols to represent word classes: sets of words that historically share a common vowel sound, which often goes back to Middle English or earlier, like the example of short-o or /o/ just cited. Sound changes that are phonetically conditioned tend to operate on phonemes, rather than on individual words one at a time, and therefore affect entire classes of words containing those phonemes in similar ways (Reference LabovLabov 1981): if the /o/ of top is unrounded or centralized to a quality like IPA [a] (the /a/ sound of Italian or Spanish), as in the Northern Cities Shift, other words in the short-o class, like lot, stock, god, job, don, doll, and sorry, would be expected to show a similar development. Any one of these words could be used to represent the whole class, but the standard set of keywords, which are conventionally written in small capitals, was conceived by Reference WellsWells (1982). The keyword system does not make subsystemic structure explicit but does share with the Labov/Trager system the other advantages over the phonetic approach of Ladefoged: it uses Roman letters that are legible to nonexperts and keeps broad, phonemic notation free of any commitment to particular phonetic qualities, allowing each keyword to stand for the complete range of phonetic realizations of the phoneme it represents. The keywords of Reference WellsWells (1982) will therefore be used in parallel with the Trager/Labov system throughout this book to address readers familiar with either system, as well as to make the sounds under discussion as clear as possible to nonexpert readers familiar with neither.

The version of the Trager/Labov notation system used here is basically the same as that set out in Reference Labov and EckertLabov (1991:7) or Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006:12), with a few minor differences. It is shown, with equivalent keywords from Reference WellsWells (1982), in Table 2.3. Where keywords are lacking in Reference WellsWells (1982), I have supplied my own, in italics. Table 2.4 shows the subsystem of prerhotic vowels, or vowels before /r/, which are subject to a great deal of dialect variation related to /r/ vocalization. Prerhotic lax vowels, which only occur before intervocalic /r/, are analyzed in Reference WellsWells (1982) as prerhotic allophones of their corresponding nonprerhotic vowels and therefore do not have separate keywords; the keywords in italics in Table 2.4 are, again, my own.

Table 2.3 Phonemes and word classes (in small caps) to be analyzed and their notation

Front-up-glidingBack-up-glidingMonophth. or in-gliding
  • i

  • kit

  • u

  • foot

  • iy

  • fleece

  • iw

  • few

  • uw

  • goose

  • ih

  • idea

  • e

  • dress

  • ^

  • strut

  • ey

  • face

  • oy

  • choice

  • ow

  • goat

  • eh

  • yeah

  • oh

  • thought

  • æ

  • trap

  • o

  • lot

  • ay

  • price

  • aw

  • mouth

  • æh

  • bath

  • ah

  • palm

Table 2.4 Subsystem of phonemes and word classes before /r/

High tenseiyr – nearuwr – cure
High laxir – spiritur – furry
Upper-mid tenseeyr – squarer – nurseowr – force
Lower-mid laxer – very^r – hurryohr – north
Lowær – carryahr – startor – sorry

The set of vowel phonemes in Table 2.3, including its prerhotic allophones in Table 2.4, will serve as what Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 12) call an “initial position”: a maximal set of phonemic contrasts, most of which are still retained in conservative dialects like traditional New York City English, against which the progress of changes in phonemic distribution and inventory – conditioned and unconditioned phonemic mergers – can be measured. It is only by adopting this full set of phonemic and allophonic contrasts in the broad transcription of data from all performances that a difference can be observed between actors who maintain a given contrast and those who have lost it: loss of contrast is reflected in the absence of a statically significant difference between the mean F1 and F2 values of the two word classes.

The number of filled cells in Tables 2.3 and 2.4 might suggest that English has as many as thirty-two vowel phonemes, which is misleading. To begin with, the prerhotic word classes in Table 2.4 are allophones of the phonemes in Table 2.3, though the details of these relationships are complex and vary by dialect, so they will be discussed as necessary in subsequent chapters. Their distinct phonetic character and involvement in regionally specific conditioned mergers do, however, justify their treatment as separate analytical entities in the phonetic analysis. In Table 2.3, /ih/ and /eh/ are of marginal status and composed mainly of prerhotic tokens of /iy/ and /ey/, respectively, in r-less dialects (in which dear sounds like idea), so they would not be separate phonemes for most speakers of North American English (or would have a marginal status).

Dialects also vary with respect to the larger set of phonemic contrasts in Table 2.3. As discussed in Section 2.1.2, those that display the Low-Back Merger, the most important variable of phonemic inventory examined in the following chapters, have no contrast between /o/ (lot) and /oh/ (thought) and most of these, outside parts of New England, have a further merger between /o/ (lot) and /ah/ (palm), producing a double merger in low-back position: /ah-o-oh/ (palm-lot-thought). The /oh/ or thought word class of Table 2.3 also contains many words that historically had /o/ and retain that vowel in modern British dialects. Reference WellsWells (1982) uses another keyword, cloth, to designate the words in this set with short-o before voiceless fricatives, such as coffee, off, soft, boss, and cost. These, along with many instances of short-o before voiced velar consonants (dog, frog, log, long, song, wrong, etc.), split off from lot to join thought in North American English and are analyzed here accordingly, so /oh/ and thought in the following chapters should be understood to mean thought-cloth. The split of Middle English short-a into short lax /æ/ (trap) and long tense /æh/ (bath) indicated in Table 2.3 is another important phonemic inventory variable that will be examined in Chapters 3 and 4: outside New York City and the Mid-Atlantic region, most speakers of North American English do not display this split, which has given way to a single short-a phoneme that will be notated as either /æ-æh/ (trap-bath) or simply /æ/ (trap), depending on the context.

Table 2.3 also indicates a phonemic opposition between /iw/ (few) and /uw/ (goose), which have merged after the coronal consonants /t, d, n, s, z, l, r/ for most North Americans, following loss of the palatal glide that distinguishes /iw/ from /uw/ in words like new, student, Tuesday, and duty. Even in nonpostcoronal environments where the glide is retained, as in few, beauty, music, cube, or excuse, the vowel could be analyzed instead as a two-phoneme sequence of /j/, the same palatal glide as occurs syllable-initially in yes, yard, or yawn, followed by /uw/ (goose). This is the approach taken by Reference WellsWells (1982), who gives no separate keyword for this word class. Nonetheless, because the prevocalic glides in /iw/ words have a different historical origin from those in words like yard and behave differently in today’s dialects (glide loss is unique to /iw/ words) and because minimal pairs like do and dew/due or coot and cute indicate a potential phonemic opposition between /uw/ and /iw/, they will be treated as part of the /iw/ phoneme here. This prevents tokens of /iw/, which tend to have strongly fronted nuclei, from skewing the mean F2 value of the /uw/ class toward higher values.

The prerhotic allophones in Table 2.4 are not the only phonetically conditioned variants that require separate treatment in phonetic analysis. In research based on the reading of word lists, the words produced by each participant are uniform across the sample, but in a sample of film and television speech the number of tokens of each word class in each actor’s data varies. If not controlled for, this can skew the actor’s mean formant values and distort the historical and regional differences under study. One way of limiting such distortion is to remove strongly divergent allophones from the calculation of mean formant values for each phoneme; this also allows change and variation in the status and character of these allophones to be studied. One allophonic effect that is consistent across all vowels is retraction or backing before a following /l/, especially when there is no intervening glide, as is usually the case with short vowels. Prelateral allophones of all of the short vowels as well as /oh/ and the long back-up-gliding vowels /uw/ (goose) and /ow/ (goat) will therefore be analyzed as separate categories, notated /il/ (kill), /el/ (tell), /ohl/ (call), /uwl/ (pool), /owl/ (cold), and so forth. (Broad transcription of allophonic environments is another advantage of the Trager/Labov notation system: /uwl/ is a more concise notation than “goose before /l/.”)

Following nasal consonants also create distinct allophones of some vowels. Most notably in North American English, /æ/ (trap) tends to be fronted and raised and comparatively resistant to retraction before the front nasals /m/ and /n/, as in ham, damp, pan, and candy. In Southern States English, as discussed in Subsection 2.1.2, prenasal allophones of /i/ (kit) and /e/ (dress) tend to merge in pairs like him and hem, or pin and pen. To indicate the joint category of labial and coronal nasals, /m/ and /n/, a capital “N” will be used to notate prenasal allophones, including tokens that would be both lax (trap) and tense (bath) in the split short-a system indicated in Table 2.3: /æN/ (hammer, panic), /æhN/ (ham, pan), /iN/ (him, pin), and /eN/ (hem, pen). Following /n/ can also have a fronting and raising effect on /aw/ (mouth), which occurs only before /n/, so words like down, gown and round will be notated /awn/.

Another important potential consonantal influence on /æ/ (trap) is that of following voiced velar stops and nasals, in words like bag, tag, gang, and thanks. In the northwestern quarter of the continent, anticipation of the tongue position required for the following velar consonant produces a front up-glide and pulls the vowel nucleus up and forward, so that bag sounds like bake with a /g/ instead of a /k/. To assess the advance of prevelar /æ/-raising, a capital “G” will be used to indicate the joint category of voiced velar stop and velar nasal, as with “N” for following nasals, so bag and thanks will be notated /æG/. This word class unfortunately creates a problem for the analysis of New York City English, in which words like bag belong with the tense /æh/ class (along with bad, cab, etc.), whereas words like thanks belong with the lax /æN/ class (along with hammer, panic, etc.), but to maintain a consistent set of word classes across the dataset the /æG/ category was applied to all actors, meaning that prevelar tokens were excluded from the assessment of the Mid-Atlantic short-a split for actors from the New York City region.

The final set of allophones to be distinguished in the phonetic analysis is those connected with the pattern known as Canadian Raising, to be discussed in Chapters 3 (Section 3.7) and 6 (Section 6.3), whereby the diphthongs /aw/ (mouth) and /ay/ (price) have higher nuclei before voiceless obstruents (south, out, tight, ice, etc.) than elsewhere (cow, loud, tie, eyes, etc.). Wells’s keywords are especially problematic in this respect because they both feature final voiceless fricatives, part of the environment that conditions raising. A capital “T” will therefore be adopted to represent the class of voiceless obstruents (stops, fricatives, and affricates) in broad transcription: /awT/ and /ayT/ will indicate /aw/ and /ay/, respectively, in Canadian Raising environments.

The full set of forty-five word classes used in this book is shown in Table 2.5, which also indicates the average number of tokens of each word class analyzed for each actor.

Table 2.5 Mean number of tokens by word class, with standard deviations, for entire sample

Phoneme/ allophone Wells keywordAdditional examplesMean NStd dev.
/i/kitbid, sick, kiss35.519.2
/il/kitpillow, still, kill5.75.1
/iN/kitpin, dinner, him12.58.8
/ir/kitmirror, irritate, spirit0.40.7
/e/dressbest, dead, get37.720.1
/el/dressbelt, tell, help11.06.3
/eN/dresspen, send, hem14.79.1
/er/dressvery, error, terrible3.63.1
/æ/trapbat, sack, cap30.316.4
/æh/bathbad, staff, cast14.69.2
/æG/trapbag, tank, gang5.54.4
/æl/trappal, tally, shallow3.13.8
/æN/trappanic, hammer, Canada4.83.5
/æhN/bathpan, ham, candy18.611.3
/ær/trapcarry, arrow, charity4.84.1
/o/lotbody, job, got33.118.3
/ol/lotfollow, solid, collar4.34.1
/or/lotborrow, forest, sorry4.23.3
/^/strutbus, duck, cup42.121.2
/^l/strutpulse, dull, color1.01.3
/^r/strutworry, burrow, hurry1.11.2
/u/footpull, took, good14.18.7
/iy/fleecebee, seat, key40.019.9
/iyr/nearbeer, here, years7.35.6
/ey/facebay, take, gate47.823.3
/eyr/squarebare, stair, care7.95.2
/ay/pricepie, side, guy38.019.6
/ayT/pricespice, knife, sight19.410.4
/oy/choiceboy, noise, coil6.05.1
/aw/mouthvow, loud, cow8.95.6
/awn/mouthpound, town, gown7.14.8
/awT/mouthdoubt, south, house10.57.0
/ow/goatboat, toe, code36.217.0
/owl/goatpole, told, coal6.24.2
/owr/forcefour, door, hoarse9.76.2
/uw/goosefood, do, two22.812.6
/uwl/goosefool, stool, cool2.73.7
/uwr/cureboor, poor, tour2.21.8
/iw/goosebeauty, new, cube12.87.8
/ah/palmfather, spa, calm3.73.6
/ahr/startbar, dark, card17.39.8
  • thought/

  • cloth

paw, sawed, cost22.512.8
  • thought/

  • cloth

bald, tall, caller9.05.7
/ohr/northborn, sort, horse9.66.7
/r/nurseperson, turn, girl18.210.1

As can be seen in Table 2.5, the average set of data on an individual actor comprises almost 700 tokens, with a standard deviation of about 320, indicating that most analyses are based on between about 400 and 1,000 tokens. The entire dataset comprises 120,288 tokens. Compared with the geographically similar acoustic analysis dataset of the ANAE, the present set is therefore less broadly representative of regional and social divisions but more individually reliable: it comprises 180 individual speakers against the 439 of ANAE (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 36), but the average number of tokens from each speaker is more than twice the approximately 300 in the ANAE (Reference Labov, Ash and Boberg2006: 37), which analyzed a similar grand total of around 130,000 vowels (Reference Labov, Ash and Boberg2006: 36). This means that the present study’s conclusions about variation and change in the phonetic quality and phonemic relations of vowels on the level of individual speakers can be drawn with much greater confidence than in the ANAE: the problem of underrepresented or missing word classes in the data on individual speakers still exists but is comparatively rare.

The smallest individual datasets studied here are 204 tokens from Irene Bedard, 191 from Scarlett Johansson, and 172 from Rosie Perez, performances that were considered important to include in the analysis despite their smaller quantity of speech. By contrast, twenty-nine datasets contain more than 1,000 tokens; the largest are 1,559 tokens from Carroll O’Connor, 1,581 from Jackie Gleason, and 1,763 from Mark Borchardt. Dataset sizes for all actors are given in Appendix A. Though a sample of around 500 tokens is normally sufficient to establish a reliable view of the most important aspects of an English vowel system, word classes that occur less frequently in natural speech can pose problems for quantitative and statistical analysis, particularly in smaller data sets but even in large ones. As already pointed out, unlike in research based on the reading of a word list or prepared passage, word classes are far from equally represented in the uncontrolled speech sample of a film script. Table 2.5 shows that data on each word class for a typical actor range from abundant (the main distributions of the trap, lot, kit, goat, dress, price, fleece, strut, and face classes) to sparse and in some cases even nonexistent (palm words and allophones of dress, goose, strut, and kit before /r/ and of trap, goose, and strut before /l/). Because there is no guarantee that particular words will occur in a given script, there may not always be enough data to produce for every actor a reliable view of variables like the phonemic contrast between /ah/ and /o/, Canadian Raising of /awT/, the F2 difference between the main distribution of goose and its allophone before /l/, or the contrast between /æN/ and /æhN/ (which is why Reference Labov, Yaeger and SteinerLabov [1972: 84] used a word list to establish the lexical distribution of /æ/ and /æh/). The analyses of subsequent chapters will therefore focus on the most frequently occurring word classes and, when necessary, actors with too little data on a particular word class will be omitted from the relevant analysis.

2.5 Auditory-Impressionistic and Acoustic Analysis

Once the list of performances to be analyzed had been established, each performance was viewed on DVD to develop a general knowledge of the role and accent to be studied (some performances had already been seen several times before the project began). It was then necessary to obtain a .wav file of each audio track, to be uploaded to Praat (Reference Boersma and WeeninkBoersma and Weenink 2012) for impressionistic and acoustic phonetic analysis.Footnote 1 For films or TV series with digital files readily available for download online, the full file was downloaded from YouTube; where digital files were not available online, they were ripped from DVDs using HandBrake software. VLC Media Player was used to strip the audio content from each file. The audio files were then exported as monophonic .wav files with a sample rate of 44.1 kHz and 16 bits per sample. With QuickTime 7, each .wav file was split into 25–30-minute segments, saved as separate files, to aid in processing. In Praat, TextGrid files matching each sound file were created, containing point tiers for each actor being analyzed. Phonetic analysis then began by opening each pair of sound and TextGrid files, listening to the dialogue, selecting words for analysis and inserting measurement points with orthographic transcriptions of the words on the appropriate actor’s point tier. An example of a sound file with its matching TextGrid file is provided in Figure 2.4. It shows the spectrogram of Humphrey Bogart saying, “We’ll always have Paris” in Casablanca (1942). The nucleus of the first vowel in Paris is selected for measurement on Bogart’s point tier.

Figure 2.4 Example of sound file and associated TextGrid in Praat

As in the Bogart example in Figure 2.4, vowels to be measured were normally restricted to those in syllables bearing primary word stress in words bearing primary phrasal stress, so as to avoid distortion from vowel tokens affected by centralization in nonprimary-stress contexts. Exceptions to these criteria were made only where tokens of particular analytical categories were rare and where careful listening determined that a token of secondary phrasal stress was sufficiently representative of its articulatory target. Following the procedure adopted by Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 37–38), a single measurement point was selected for each vowel, representing its articulatory target. This was the maximal value of F1 (or the middle of an F1 steady state) for vowels whose central tendency is downward followed by upward movement of the tongue, or an appropriate point of inflection in F2 for vowels whose central tendency is displacement of the tongue toward the front or rear periphery of the vowel space, possibly followed by a centering in-glide.

Whereas vowel measurement was performed by acoustic analysis (as explained in Section 2.2), /r/ constriction was coded impressionistically as present or absent (a notation of r-1 or r-0 in the word label on the TextGrid), following the standard established by Reference Labov, Yaeger and SteinerLabov (1972: 73) and Reference ElliottElliott (2000a, Reference Elliott2000b). In some cases, binary judgment of tokens that involved weak or partial rhotic constriction required careful and repeated listening, supplemented by inspection of the third formant trajectory in the spectrogram. This could not resolve the uncertainty in every instance, but such uncertainties represent a very small proportion of the data; in the large majority of cases impressionistic analysis was straightforward. The /r/-constriction codes were later counted with a formula in Excel, so that the constriction frequency for each actor was reported as the percentage of constricted /r/ out of the total number of tokens containing an /r/ that is a candidate for vocalization. This total was an average of 109 tokens per actor, with a standard deviation of 56, so most of the /r/ analyses are based on between approximately 50 and 165 tokens. By comparison, Elliott, who analyzed the /r/ pronunciation of 202 actors in 109 films (Reference Elliott2000b: 105), attempted to measure sixty tokens per actor, which was not always possible (107); Labov, Ash, and Boberg relied on only ten to twenty tokens per speaker (Reference Labov, Ash and Boberg2006: 47). The tokens measured in the present study include both words with stressed prerhotic vowels that also serve as vowel tokens, like car, north, and girl, and /r/ in unstressed syllables of words like center and forget, which were analyzed for their nonprerhotic vowels. Analysis of /r/ was not performed for seven recent performances by actors from rhotic regions because an initial auditory assessment made it clear that the frequency of constriction in these performances was 100 percent and did not need to be measured.

Auditory-impressionistic analysis was also used to code several other consonantal variables as part of the word entry in the TextGrid file. These included the presence or absence of glides in /iw/ words (news as “nyooze” or “nooze,” etc.); flapping of intervocalic /t/ (city as “sitty” or “siddy,” etc.); presence or absence of preaspiration in “wh”-words (wheel as “weel” or “hweel,” etc.) and words like human (the first syllable like “hue” or “you”); and alternation between velar and apical nasals in “–ing” words (talking or talkin’, etc.). These variables generally showed less variation than anticipated, even in films from the earlier period, or displayed patterns that are well known from other studies. Many actors, for example, frequently used –in’ forms in informal speech, matching the results of previous studies of popular speech (Reference FischerFischer 1958; Reference LabovLabov 1972a: 239; Reference TrudgillTrudgill 1974; Reference HoustonHouston 1985; Reference De WolfDe Wolf 1992: 73–83; Reference WagnerWagner 2012), and /t/ was usually flapped (pronounced like /d/) where flapping would be expected in North American English (Reference Zue and LaferriereZue and Laferriere 1979; Reference De WolfDe Wolf 1992: 57–73; Reference De JongDe Jong 1998; Reference Eddington and ElzingaEddington and Elzinga 2008). The quantity of data on the remaining consonantal variables, (iw), (hw), and (hj), was not sufficient to support meaningful correlational analyses. Consequently, to focus more effectively on variation in vowels, consonantal variables other than /r/ will not be analyzed in this book.

A final variable that was coded impressionistically, but only for Southern and African American actors, is monophthongization of /ay/ (price), also known as glide deletion; when audibly present, following the procedure of Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 38), this was noted with the symbol “{m}” in the word label on the TextGrid. In quantitative analysis, the number of {m} notations was divided by the number of tokens of both /ay/ and /ayT/ to calculate a percentage of monophthongization in each environment.

Following completion of the TextGrid file for each actor, a first pass at acoustic analysis was performed by Praat, using an automated script (Reference ClayardsClayards 2007, as modified by Thomas Kettig), in which the analysis parameters were usually set to five formants and a maximum of 5000 Hz for men and 5500 Hz for women. In a few cases, the maximum was set to 4500 Hz for men with particularly deep voices and to 6000 Hz for women with particularly high voices. The script produced a text file in which each line contained a word from the TextGrid, the time point of the word’s measurement within the .wav file, and the values in Hz of F1, F2, and F3 at that point.

The text files were then imported to Excel for quantitative analysis, where each word entry was assigned a code indicating the word class it belonged to, based on the standard lexical membership of each of the full set of word classes in Table 2.2. As discussed in Section 2.4, for instance, words like cot were assigned to /o/ (lot) and words like caught to /oh/ (thought), regardless of whether the actor being analyzed retained this phonemic distinction. Some care had to be taken in cases of legitimate regional or individual variation in phonemic incidence of the sort discussed above in Subsection 2.1.1: for example, the word on would normally be assigned to /o/ (lot) for northern actors but to /oh/ (thought) for Midland and Southern actors, as failure to respect such differences might obscure the degree to which the two phonemes are distinct. Short-a tokens were assigned to /æ/ (trap) and /æh/ (bath) based on the traditional New York City or Philadelphia short-a patterns described in previous research (see Chapter 5), as appropriate to the actor’s regional origin, unless a combination of listening and visual inspection clearly established that a particular token deviated from the traditional pattern, in which case this was also considered legitimate variation in phonemic incidence and tokens were assigned based on their pronunciation. Any doubtful cases of assignment that could not be resolved by careful listening were referred to the authority of Reference Kenyon and KnottKenyon and Knott (1953), the standard guide to the pronunciation of North American English (and the basis for the pronunciation entries of the Merriam-Webster’s American English dictionaries).

Using Excel’s sorting functions, the data for each actor were then sorted first by word class and secondarily by each formant value to check for measurement errors: both gross errors in formant identification, for instance when F1 was missing so F2 was misread by Praat as F1, and less obvious inaccuracies in formant values produced by problems connected with the sensitivity of the analysis. All formant values that appeared to fall outside the main distribution of values for each formant for each word class, without an obvious explanation in allophonic context, were flagged for verification. This was done “by hand,” that is, by going back into Praat and verifying, using a combination of visual inspection and careful listening, whether the initial measurements of each flagged word were correct or required modification.

If modification was called for, corrected measurements were obtained by changing the parameters of the analysis, usually the maximum frequency of Praat’s linear predictive coding analysis, which plots formants on the spectrogram as linear series of dots. This threshold often had to be set lower (to 5000, 4500, or even 4000 Hz) to obtain the necessary resolution to see two formants at a low frequency for high- and mid-back vowels like /uw/, /owr/, and /oh/, and higher (usually to 5500 for men or 6000 Hz for women) to avoid spurious formants appearing between F1 and F2 for mid- and high-front vowels, most frequently /iy/ (e.g., when the initial analysis recorded an F2 of 1500 Hz for a high-front vowel). Prenasal vowels were often especially problematic because anticipatory nasalization can impose formants associated with the opening of the nasal cavity over those associated with tongue position, thereby producing inaccurate values for F1 in particular.

The frequency of needed corrections depended on the sound quality of the original source and the .wav file made from it and on the characteristics of the actor’s voice (as previously mentioned, the high, breathy voices of some actresses were especially problematic), but typical rates ranged from 10 to 25 percent of the tokens analyzed for each actor. In no case could the initial, automated analysis, performed with constant parameters across all tokens, be considered accurate: careful error-checking and manual correction of at least some tokens was required for every actor. Computerized acoustic analysis is tremendously useful in sociophonetic research and recent advances in computational methods have greatly increased the speed with which such analyses can be performed, but the experience of carrying out the 180 acoustic analyses in this project clearly indicates that the accuracy of acoustic analysis depends on three factors that have nothing to do with computational methods: careful listening, familiarity with the accent being analyzed, and a solid understanding of both acoustic and articulatory phonetics.

Once each actor’s acoustic data had been error-checked, mean formant values for each word class were calculated, the percentage of constricted /r/ was determined and t-tests were performed to assess phonemic contrasts, as discussed in the next section. These summary data were then copied to a master spreadsheet in Excel, containing the same data from the whole sample, for quantitative and statistical analysis. Before undertaking this analysis, however, each actor’s formant values were normalized with respect to those of the whole sample to eliminate differences in formant values arising from overall differences in the mean pitch of the actors’ voices: because formant values are mathematically related to the fundamental frequency, a given vowel quality produced by an actor with a low voice will have lower formant values than the same quality produced by an actress with a high voice. Failure to normalize these differences would cause women’s vowels to appear lower and further front than men’s, even though they sound the same, or make vowels that do not sound the same appear to have the same formant values.

There are several methods of normalization, each with advantages and disadvantages. Following the approach of Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 39–40), the analysis in this book uses the additive point system of Nearey (1977), in which the raw formant values of each speaker in a group are adjusted (up for lower voices and down for higher voices) by a scale factor derived from the difference between the natural log means of the speaker’s and the group’s formant values. One important modification was made to the procedure used in the ANAE, however: to ensure strict comparability across vowel systems and avoid skewing of individual means due to unusual quantities of certain word classes, rather than using each actor’s complete dataset to calculate the F1/F2 mean, it was calculated as the mean of the means of a standard set of word classes. To maximize reliability and coverage of the entire vowel space, the set of word classes used for this purpose was essentially all those with an average of more than twelve tokens in Table 2.5, with combinations of some smaller classes that are not distinct in the speech of many or most actors (the combined sets are an average of the means of their constituent word classes). Specifically, with parentheses indicating the combined sets, eighteen word classes were used to calculate F1/F2 means: (/aw/ + /awn/ + /awT/), /ow/, (/owr/ + /ohr/), /uw/, /iw/, /iy/, /ey/, (/ay/ + /ayT/), /ahr/, /oh/, /r/, /i/, /e/, (/æ/ + /æh/), /ænd/, /o/, /^/, and /u/.

Normalization was performed using the entire sample of 180 actors, which as reported previously is 59 percent male. This produced a combined F1/F2 mean of 1113 Hz (with a standard deviation of 111.6) of which the natural log is 7.01 (std dev. = 0.1). This is similar to the grand mean log of 6.90 produced by the sample of 345 participants in the ANAE acoustic analysis (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 40), which was 63 percent female (Reference Labov, Ash and Boberg2006: 28). Given the higher proportion of males in the present sample, one would expect the natural log mean to be lower than the ANAE value rather than higher; the discrepancy may be due to the procedural difference in mean calculation just described, or to differences between the CSL software used in the ANAE analysis (Kay Elemetrics’ Computerized Speech Lab program) and the Praat software used here. In any case, using this mean value, the normalization formula produced scaling factors for the formant values of each actor, ranging from 0.83 for the actress with the highest voice (Sally Field) to 1.21 for the actor with the lowest voice (Sylvester Stallone). The average scaling factor is 0.90 for women and 1.08 for men, but there is a limited overlap between the sexes in the middle of the range: the actress with the deepest voice is Veronica Lake, with a scaling factor of 1.05, comparable to some higher-pitched men, while the man with the highest voice is Joe Pesci, with a factor of 0.98, within the lower end of the female pitch range. All the analyses presented in subsequent chapters are based on the normalized formant values produced by these scaling factors.

2.6 Quantitative and Statistical Analysis

Beyond measuring the mean F1 and F2 value of each word class for each actor, several secondary or derived phonetic measures were calculated for each actor, as mentioned in Section 2.4. These included both statistical tests and distance measures. Following the method of Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 40), t-tests were used to assess the presence or absence of a phonemic distinction between /o/ and /oh/ (lot and thought), between /æ/ and /æh/ (trap and bath), and between /æ-æh/ and /æN-æhN/. The last pairs focus on the contrast between prenasal and nonprenasal short-a, with the Mid-Atlantic tense and lax categories combined, revealing the difference between the Mid-Atlantic and “nasal” short-a patterns examined by Reference Labov, Ash and BobergLabov, Ash, and Boberg (2006: 173–184). The statistical significance of Canadian Raising of /awT/ and /ayT/ was also assessed with t-tests. Distance measures, expressed in Hertz, were used to examine variables like the centralization or fronting of /uw/ and /ow/ (goose and goat) in comparison to their allophones before /l/, or the raising of /aw/ and /ay/ (mouth and price) before voiceless obstruents versus other environments. For phonemes or allophones separated diagonally in a two-dimensional F1/F2 space, such as /æ/, /æh/, /æN/, and /æG/, the Cartesian (or Euclidean) distance was calculated using both F1 and F2 distance. Other derived measures are explained in subsequent chapters as they arise in discussion.

Some readers may object that the widespread use of multiple t-tests in this book’s analysis makes it subject to Type I errors (false positives), meaning that out of the hundreds of t-tests conducted, a few might suggest a significant phonemic or allophonic distinction that is not in fact supported by the data. While that is a valid objection, t-tests are the standard method of assessing the statistical significance of a phonemic contrast, as in the ANAE, where the Plotnik vowel analysis program developed for the quantitative analysis of acoustic data “calculates a t-test of the statistical significance of the difference between any two vowel means” (Reference Labov, Ash and BobergLabov, Ash, & Boberg 2006: 40). It will be seen in the following chapters that the results of the t-tests used here, at a general level, conform very closely to expected patterns of phonemic contrast that are well known from previous research, suggesting that any effect of Type I errors is negligible.

To identify the diachronic patterns in real time that will be discussed in Chapter 3, that is, to test for correlations between dependent phonetic measures and performance year, the Pearson Product-Moment Correlation was calculated for every phonetic measure analyzed; details of these tests are given in Chapter 3. The dependent variables were then ranked in terms of the strength of their correlations with performance year, to identify the most important trends. A potential concern in these analyses is that the year of the performance chosen for a particular actor could affect the diachronic patterns observed, whereas the actor’s birth year is a less arbitrary time scale. When compared, however, the two sets of analyses produced very similar results, indicating that any skewing resulting from performance selection was negligible. For example, the correlation coefficients of the two timescales with the frequency of /r/ constriction for the whole sample are virtually identical: r = 0.481 for birth year and r = 0.480 for performance year. Overall, across 133 phonetic measures for the GNAE sample, a Pearson correlation test of the correlation values produced by the two timescales returned a value of r = 0.996, indicating almost perfect conformity. The following analyses will therefore use performance year because that, rather than birth year, is the more important date in terms of the public impact of an actor’s speech pattern, and this is a study of film and television speech, not of the off-screen speech of private individuals.

Chapters 47 examine sex and regional and ethnic differences more than change over time, which requires different statistical methods. For these analyses, mean formant values for each word class and derived measure were calculated for each sex and regional or ethnic group and compared to those of the other sex or the main group representing General North American English, discussed in the next section. The differences were then sorted by size and assessed with t-tests to determine which were most important in each comparison. This identification of the most distinctive phonetic patterns of each group, together with reference to previous research on the same variables, allowed indices to be constructed, aggregating several of the most diagnostic attributes of each group into a single value that could be used to rank its members according to how well they exemplify it. The design of each index will be explained where it is used in the following chapters.

Some readers may object to the absence of a multivariate statistical analysis to assess interactions between performance year, sex and regional or ethnic group, as well as the independent effects of those factors, but this issue has been addressed instead by examining interactions, such as between period and region in Chapter 3 and between sex and performance year in Chapters 4 and 5, on an individual basis where such analyses seemed most necessary and were supported by adequate data. In the Pearson correlation analyses, moreover, effect sizes are estimated as r2 values, which acknowledges the influence of other factors on the variation in question.

As a general point for readers less familiar with quantitative and statistical analysis, it should always be kept in mind that while the calculation, presentation, and comparison of mean formant values is essential for the sociophonetic analysis this book undertakes and for a clear view of the relative positions of vowel phonemes in a vowel chart, such means are abstract statistical summaries of vowel production, not concrete instances of pronunciation. As in any analysis that compares two sets of data in terms of their mean values, there is often considerable overlap of the ranges of the individual values on which the means are based. In phonetic terms, this means that some individual tokens of one vowel phoneme may occur within the range of a neighboring phoneme; in social terms, it means that the values produced by some individual members of one group (northerners, women, younger people, etc.) may overlap with those of some members of the group with which they are being compared (Southerners, men, older people, etc.). Overlapping ranges of values are normal in quantitative analysis; statistical analyses like the t-tests used in the following chapters are designed to determine whether the two sets of values being compared are different at the group level, despite some potential overlap of individual values in the range between them. To illustrate with an obvious example, the fact that men are, on average, taller than women does not deny the possibility that some women are taller than some men, but nor does the existence of unusually tall women or unusually short men invalidate the more general height difference at the group level.

2.7 Subsamples for Analysis: Defining General North American English

The book’s phonetic analysis of change and variation in film and television speech will begin in Chapter 3, which examines the emergence over eight decades of the regionally neutral or transregional variety of North American English most commonly heard in the mass media today. This type of speech has attracted several labels in both popular and academic discourse, of which the most common are General American and Standard American English. Neither term is ideal: General American excludes Canada, home to a variety of English that is much closer than many eastern American varieties to the regionally neutral continental standard that is based on Western American speech; and Standard American implies that other varieties are nonstandard, which is only partly true. This book will therefore adopt the term General North American English (GNAE) to refer to the variety that is indeed “standard” from the dialect point of view, involving no marked divergence from widely held norms of “correct grammar” or from the common vocabulary used across the continent; and “general” or regionally neutral from the accent point of view, involving phonological and phonetic patterns heard at least at higher social levels across the majority of the continent. This definition of GNAE excludes dialects that use nonstandard grammar and vocabulary, like African American Vernacular English and the working-class vernaculars of many other ethnic groups, and accents that most North Americans would identify as belonging to particular regions. The latter most obviously include strong Eastern New England, New York City, and Southern accents for Americans and Newfoundland accents for Canadians, but here they will also include two Inland Northern actors with particularly strong local accents.

Dividing the actors and their performances into regional and ethnic subsamples is a crucial step in the analysis to be undertaken in this book: including all the performances in a single analysis would allow regional and ethnic variation to obscure the historical patterns of sound change that will be studied in Chapter 3, as well as the sex differences to be explored in Chapter 4. For instance, recent performances by African American and local-sounding New York City actors retain a frequency of /r/ vocalization that is not representative of the contemporary development of GNAE, so including these groups in the GNAE sample would make the general trend toward consistent /r/ constriction look weaker than it really is. It is also important to have a “mainstream” comparison group that allows the most distinctive attributes of the regional and ethnic groups examined in Chapters 5–7 to be identified. The last column of Table 2.1, headed “Group,” indicates this division. Those actors marked “G” are in the GNAE group on which the analyses of Chapters 3 and 4 will be based; they number 130, or 72 percent of the sample. The remaining fifty (28 percent), marked “R/E,” are in the regional and ethnic groups on which the analyses of Chapters 5–7 will be based. The regional analyses will also include actors from the G group who share the relevant regional background but whose speech in the performances studied here is not strongly regionally distinctive.

Assignment of actors to the groups was done partly on the basis of their regional or ethnic origin and partly on the basis of the sound of their speech in the performances analyzed here. Two issues complicate these assignments. First, as will be shown in Chapter 3, the standard of GNAE shifted over the twentieth century, from one that embraced the middle-class variety of New York City and Boston English to one that rejected it as nonstandard or at least regional. This transition was studied by Reference LabovLabov (1972a) while it was still in progress in the mid-1960s in New York City; a retrospective analysis of it appears in an interim report on the film data analyzed in this book, in Reference BobergBoberg (2018). In the early twentieth century, for instance, a second interim report on the film data in this book found that many actresses from Midwestern and Western backgrounds vocalized at least some of their r’s, in deference to the New York–based standard of the time, whereas in the late twentieth century this pattern was reversed, with New York–raised actresses abandoning the r-less speech of their native region in deference to a new, r-ful, California-based standard (Reference BobergBoberg 2020a). This shift in prestige means that actors who speak New York City English in performances before the mid-1960s (the temporal division discussed in Subsection 2.3.3) should be placed in the GNAE group, whereas those who continue to speak distinctively local New York City English in performances after the mid-1960s, who will be shown in Chapter 5 to be only a subset of the more recent group of actors from the region, should be assigned to the New York City regional group.

The second complication relates to the same division between distinctively local-sounding and regionally neutral-sounding performances by actors from dialect regions other than New York, particularly from the South but also the Inland North. Those who sound distinctively local are included only in the regional samples analyzed in Chapter 6; those who do not are included in both the GNAE and regional analyses. Chapter 6 also includes a section on Canada, but all the Canadian actors with the exception of the two Newfoundlanders will be included in both the regional and GNAE analyses because, as pointed out in Chapter 1, Canadian actors have long been an important presence in Hollywood and, phonologically speaking, Canadian English is much more similar than either New York City or Southern States English to the transregional American standard. This similarity was previously observed by the ANAE, in which a principal components scattergram of the mean values of 21 vowel measures from the 439 acoustic analyses finds Canadians joining Midlanders and Westerners in the middle of the chart, as a regionally neutral group between the extremes represented by more distinctive regions (Reference Labov, Ash and BobergLabov, Ash, and Boberg 2006: 146–147).


1 Thomas Kettig’s assistance in obtaining the .wav files and developing the analytical procedure in Praat is greatly appreciated. All subsequent steps in phonetic analysis, including selection of tokens and measurement points, data tabulation, error correction, and quantitative and statistical analysis, were carried out exclusively by the author.

Figure 0

Figure 2.1 The vowel space: vowel chart showing articulatory space formed by dimensions of height and advancement

Figure 1

Figure 2.2 Spectrogram example: spectrographic analysis of the words from Figure 2.1, spoken by the author

Figure 2

Figure 2.3 Relation of formant values to vowel quality: vowel chart displaying acoustic data from Figure 2.2, corresponding to articulatory vowel chart in Figure 2.1

Figure 3

Table 2.1 Actors and performances analyzed, with birth year (BY), native dialect region, performance year, and analysis group, sorted by performance year

Figure 4

Table 2.2 Distribution of performances studied, by actor’s region or ethnic group, performance period, and actor’s sex

Figure 5

Table 2.3 Phonemes and word classes (in small caps) to be analyzed and their notation

Figure 6

Table 2.4 Subsystem of phonemes and word classes before /r/

Figure 7

Table 2.5 Mean number of tokens by word class, with standard deviations, for entire sample

Figure 8

Figure 2.4 Example of sound file and associated TextGrid in Praat

You have Access