1.1 The goals of intonational analysis
The following questions need to be addressed when analyzing the intonation of a language:
1. Which tonal units are in paradigmatic contrast? In an autosegmental-metrical analysis (see Ladd Reference Ladd2008), for instance, this involves determining an inventory of tonal units, where these units consist of one or more tones and are specified for association with tone bearing units or different levels of prosodic structure.
2. How do such tonal units combine syntagmatically to form a phonologically well-formed utterance? This involves establishing the set of possible combinations of tonal units.
3. How do the tonal units of the language (or combinations thereof) contribute to the pragmatics of any given utterance? This involves determining how intonational form relates to pragmatic function.
Autosegmental-metrical (AM from now on) studies of intonation, in particular those carried out within the Tones and Break Indices (ToBI) framework, have typically focused on addressing the first of these questions. In such analyses (see chapters in Jun Reference Jun2005, Reference Jun2014 and Frota & Prieto Reference Frota and Prieto2015b, but see also Arvaniti Reference Arvaniti2016 and Cangemi & Grice Reference Cangemi and Martine Grice2016 for a different view), tonal units are classified as either delimitative (i.e. edge tones, also referred to as boundary tones, associated with the edges of intonation phrases) or culminative (i.e. pitch accents, associated with stressed syllables). This terminology goes back to Trubetzkoy (Reference Trubetzkoy1958), and the distinction has been made in numerous approaches to intonation (Trager & Smith Reference Trager George and Smith1951, Bolinger Reference Bolinger1958, Cohen & ’t Hart Reference Cohen and Hart1967, to name but a few). The main reason behind this distinction is that, in many cases, certain tonal events appear to be consistently anchored at metrically-prominent positions, such as lexically-stressed syllables, whereas other tonal events appear to coincide with phrase edges.
A currently accepted way of describing the intonation of a language begins by listing all possible paradigmatic contrasts within each kind of tonal unit, i.e. the edge tones and, if present, the pitch accents in the language. A recent proposal for an International Prosodic Alphabet (IPrA) with the aim of providing a set of labels sufficient for describing the intonation system of any language exemplifies this approach:
Nowadays, there is an ample consensus among researchers (and developers of ToBI systems across languages) on the basic tenets of the AM model, namely that prominence and phrasing are two key aspects of the intonation systems of languages. Connected to these two notions, a set of phonologically contrastive pitch events—pitch accents and boundary tones, respectively for prominence and phrasing—may be defined. (Hualde & Prieto Reference Hualde and Prieto2016)
Autosegmental-metrical approaches to intonation have also incorporated a type of tone that has been shown to have not only a delimitative function (its main function), but also a culminative function. This type of tone, referred to as the phrase accent, has a primary association with the edge of the intermediate or intonation phrase (the phrase part of the name) as well as a secondary association with a metrically strong syllable (the accent part of the name, Grice, Ladd & Arvaniti Reference Grice, Robert Ladd and Arvaniti2000, Ladd Reference Ladd2008). The metrically strong syllable can only be postnuclear; it cannot be the strongest metrical element in the phrase. Thus, the culminative function of the phrase accent operates at a domain lower than the intermediate or intonation phrase. This tone may alternate between being at the right edge of a phrase (i.e. on its last syllable), and on a postnuclear stressed syllable if present. In the English rise–fall–rise tune, for instance, the low tone element of the melody is analyzed as a low phrase accent (L-), because it aligns with the right edge of the phrase when the nuclear accent is on the last word of a phrase (example (1a) below), and with a stressed syllable within the postnuclear stretch when the nuclear stress occurs earlier (example (1b)). Words carrying nuclear stress are capitalized, and stressed syllables are written in bold; the examples are taken from Ladd Reference Ladd2008: 181).
(1) Association of tonal sequence L∗ +H L- H% with utterances differing in metrical structure (Ladd Reference Ladd2008: 181)
In the latter case, the phrase accent is analysed as having a secondary association to a metrically strong syllable in the postnuclear stretch (Grice et al. Reference Grice, Robert Ladd and Arvaniti2000). Discussion of this phrase accent tone (or tonal complex if it is bitonal), with its double role in the intonation system, opened up the possibility that tones did not have to be exclusively culminative or delimitative. This possibility is discussed by Ladd (Reference Ladd2008: 286ff.), who proposes that whole phrase-length tunes can be represented as sequences of abstract tones which are not intrinsically either accent tones or edge tones, a proposal we return to shortly.
We now turn to the second question. Once the inventory of pitch events is established for a given language, the next step is to explore which of the possible combinations of these events are admissible, referred to in the second question above. One typical way to explore the combinations, is to look at what is referred to as nuclear contours – combinations of the last pitch accent and one or more following edge tones. The concept of nuclear contour stems from the British school of intonation (i.e. it was referred to as ‘nuclear tone’ in Crystal Reference Crystal1969, and ‘nuclear tune’ in O'Connor & Arnold Reference O'Connor and Arnold1973), affording a special status to the final (nuclear) pitch accent of a phrase. Although Pierrehumbert (Reference Pierrehumbert1980) and much work building on her model made no distinction between prenuclear and nuclear accents, it has nonetheless become common practice in AM analyses to concentrate on the nuclear contour, with the implicit assumption that this region of the intonation contour is of particular importance. Ladd (Reference Ladd2008: 286) makes the importance of the nuclear pitch accent explicit, pointing out that it is obligatory, whereas prenuclear accents can be entirely absent.
The third question, how tonal units contribute to the meaning of an utterance, is addressed by positing pragmatic functions expressed either by individual pitch accents and edge tones or by combinations of these. Although the form–function relation can make reference to individual tones and tonal complexes – i.e. to individual pitch accents and individual boundary tones (Pierrehumbert & Hirschberg Reference Pierrehumbert, Hirschberg, Cohen, Morgan and Pollack1990 and Steedman Reference Steedman2014 for English; Portes & Beyssade Reference Portes and Beyssade2015 for French), many approaches discuss the meaning of nuclear contours as a whole, often referring to them in holistic terms, such as claiming that L* H%, a nuclear low rise, is the most common nuclear contour of yes–no questions (Hualde & Prieto Reference Hualde and Prieto2015 for Spanish; Frota & Prieto Reference Frota and Prieto2015a for Romance languages in general).
It is also possible to look even further than the nuclear contour and posit a meaning for whole intonation phrases, such as in the approach of Sag & Liberman (Reference Sag and Liberman1975), where contours are described according to the function they express, e.g. contradiction contour. This contour has been analyzed in terms of pitch accents and boundary tones, e.g. in the American English ToBI training materials, as discussed in Grice (Reference Grice1995), with two variants: H* L* L* L-H% and %H L* L* L-H%. In this melody, the pitch accent H* appears to alternate with an initial edge tone %H, where H* occurs when the first word of the utterance contains a secondary stress, and %H when there is no secondary stress. If the analysis is correct, this would mean that the initial H tone of the contour is intrinsically neither a pitch accent nor an edge tone (i.e. it only becomes a pitch accent or an edge tone when applied to a specific utterance), supporting the idea of flexibility in the association of tones at the left edge of the phrase as well as at the right edge (as it the case with phrase accents).
Building on the observations discussed above in relation to English, we begin in Sections 1.2 and 1.3 by exploring attested cases of melodic alternations in Catalan and Spanish. These cases further challenge the idea that intonation contours can be modeled as functionally equivalent sequences of prenuclear and nuclear contours, or of pitch accents and boundary tones.
To account for these alternations we propose the concept of melodic construction, or melody, consisting of the following three components:
• A sequence of tonal primitives (e.g. level or dynamic tones), not necessarily specified as pitch accents or edge tones.
• If not predictable, a set of tonal–metrical association principles for linking these tonal primitives to any given metrical structure in the language.
• A pragmatic specification that may include communicative contextual information (e.g. construction X1 serves pragmatic function Y1 when used in syntactic context Z1; construction X1 serves pragmatic function Y2 when used in discourse context Z2).
The first component of the melodic construction, that is the sequence of tonal primitives that are not specified as pitch accents or edge tones, builds on Ladd's (Reference Ladd2008) proposal to represent tunes as sequence of abstract tones. However, Ladd's proposal specifies which of these abstract tones are nuclear and which are not (i.e. prenuclear and postnuclear), whereas our account does not afford any special status to a nuclear tone, leaving all tones unspecified at this first stage.
In Sections 2 and 3, we report on two laboratory experiments aimed at testing whether our intuitions regarding the structure of two Spanish melodic constructions, the low–rise–fall and the circumflex contour, are consistent with production data from native speakers and non-native Italian learners of Spanish (i.e. L2 speakers of Spanish). Finally, Section 4 elaborates on the concept of melodic construction, and discusses the implications of our findings for phonological models of intonation.
1.2 Tonal metrical association in Catalan and Spanish
Prieto (Reference Prieto2002) describes a number of Catalan melodies, each composed of two pitch accents and one boundary tone. In phrases of two or more prosodic words (in Catalan containing one lexically-stressed syllable each), the first pitch accent always associates with the first stressed syllable of the phrase, the second pitch accent always associates with the last stressed syllable, and the boundary tones associate with the phrase edge. For instance, when the rising–falling exhortative tune L*H HL* L-L% is produced with the sentence Vine a men jarne ‘Come eat some of this’ (stressed syllables in bold), the first pitch accent L*H associates with the initial stressed syllable vi, the second pitch accent HL*, a fall, with the stressed syllable jar, and the boundary tone L% with the end of the phrase. In phrases of one prosodic word only, it is the first accent L*H that is realized, not the nuclear one, HL*. This suggests that the nuclear contour does not have a special status in the structure of these tunes. In fact, the nuclear pitch accent does not appear to be the obligatory part of this tune at all.
These Catalan melodies are fundamentally different from those informing models in which the nucleus has a special status. In the British school (e.g. O'Connor & Arnold Reference O'Connor and Arnold1973, Halliday Reference Halliday1967), phrases exhibit the same ‘nuclear tune’ – the nuclear pitch accent and following edge tones – regardless of the number of prosodic words in the phrase. In phrases with more than one prosodic word (PW), there may also be a ‘prenuclear tune’, but its components, referred to as ‘prehead’ and ‘head’, are not obligatory for there to be a well-formed phrase; their presence depends on the number of stressed and unstressed syllables available before the nuclear syllable. The importance of the nuclear pitch accent has also been confirmed in work carried out in the AM framework on English (Calhoun Reference Calhoun2010) as well as for Dutch (Gussenhoven Reference Gussenhoven2005) and German (Grice, Baumann & Benzmüller Reference Grice, Baumann and Benzmüller2005).
We will now show that, similarly to Catalan, Peninsular Spanish also appears to have melodies with alternating nuclear contours. For instance, a fact that has passed unnoticed in most of the previous literature on Spanish intonation is that broad-focus declarative utterances regularly display melodic alternations in their nuclear contours as a function of phrase length.Footnote 1 This can be seen in Figure 1 (adapted from Torreira Reference Torreira2015; audio clips of these utterances are provided as part of the supplementary materials), which shows a series of simple broad-focus statements differing in length (one PW: la manda rina ‘the mandarin’; two PWs: una manda rina ‘one mandarin’ and la manda rina ma dura ‘the ripe mandarin’; three PWs: una manda rina ma dura ‘one ripe mandarin’; all of them produced by the same speaker in a carrier utterance starting with the word vale ‘OK’ elicited in a non-contrastive context).Footnote 2 Interestingly, a similar alternation in nuclear contour shape has been observed in imperative utterances in Mexican Spanish: L+H* L% in short imperatives of one PW vs. L* L% in longer imperative utterances (Brehm, Lausecker & Feldhausen Reference Brehm, Lausecker, Feldhausen, Fuchs, Grice, Hermes, Lancia and Mücke2014, Lausecker, Brehm & Feldhausen Reference Lausecker, Brehm, Feldhausen, Campbell, Gibbon and Hirst2014). In both Spanish varieties, it thus appears that the final accent may alternate between rising and falling (LH* vs. HL* or L*) depending on whether the intonational phrase has one or more PWs.
In Catalan and Spanish, therefore, neither the nuclear accent nor the nuclear contour (the nuclear accent and following edge tones) constitute adequate units for characterizing the form and function of a tune, since its nuclear accent is either rising or falling depending on the length and metrical structure of the phrase. In her study on Catalan, Prieto (Reference Prieto2002) argues for a hierarchical model of tune–text association in which tunes such as the ones presented above could be modeled as a series of underlying pitch accents with different priorities for their association (e.g. L*H preferred over HL* in the exhortative tune) and a constant final boundary tone. Depending on the availability of metrically strong syllables, either the prioritized pitch accent alone or both pitch accents would be associated with the metrical structure (and therefore realized). Although this analysis undermines the view that intonation contour meanings can be adequately classified based on their nuclear configurations, it respects the basic distinction between culminative and delimitative tones adopted by autosegmental-metrical approaches to intonational analysis.
1.3 The low–rising–falling melodic construction in Spanish
We now describe a Spanish melody that challenges the dichotomy between culminative and delimitative tones (i.e. pitch accents and edge tones). One of the functions of this melody, which we will refer to as the low–rise–fall from now on, is that of expressing obviousness when produced with a declarative sentence.Footnote 3 The melody is composed of three tonal components: an initial low (which we will transcribe as L), a medial high (H), and a final low (L) that is typically truncated when it co-occurs with the previous high component in the same syllable (see Prieto & Ortega-Llebarria Reference Prieto, Ortega-Llebaria, Vigário, Frota and Freitas2009, for another case of final tonal truncation in Spanish).
As an example of this melodic construction in use, we will discuss one excerpt from the Nijmegen Corpus of Casual Spanish (Torreira & Ernestus Reference Torreira and Ernestus2012 in which one speaker produces it at least three times within one conversational turn.Footnote 4 The excerpt is transcribed in example (2).
Each of the phrases that carry a low–rising–falling melody is underlined and numbered in the transcript. They are also illustrated with waveform-aligned pitch tracks in Figure 2 (audio clips of these utterances are provided as part of the supplementary materials). In the excerpt, two speakers from Madrid argue whether girls and boys should be raised in the same way. Both speakers agree that both genders should receive the same treatment, but speaker 2RM makes it clear that this should be beyond any doubt by using the interjection claro ‘of course’ at the beginning of the turn, and, in our opinion, also by using the low–rise–fall in at least three out of the four phrases in this turn (as opposed, for instance, to a rising–falling declarative tune which would not have conveyed the same sense of obviousness to the utterances).
The reader may wonder why we analyze these three examples as featuring the same melodic construction, since the first and third examples on the one hand, and the second example on the other, feature clearly different nuclear contours. From an interactive point of view, it could be argued that the three of them occur within the same conversational turn, that this turn appears to implement mainly one conversational move (i.e. that of emphasizing the obviousness of one of the points under discussion), and that it is likely that the speaker chose functionally equivalent intonation patterns for each of these statements. When focusing on the nuclear contours of the three utterances, however, (claro in Figure 2a, -blando in Figure 2b, and -ción in Figure 2c) there are at least two different patterns: a nuclear low–rise with a very slight terminal fall (Figures 2a and 2c), and a clear nuclear rise–fall (Figure 2b). Thus, they do not appear at first sight to share the same intonational identity. Nonetheless, when heard in their conversational context, the three utterances strike the attentive native listener as functionally equivalent at the intonational level. In other words, despite the different nuclear contours used, there seems to be a repetition of the same intonational construction, mirrored by a repetition of the action of the three parts of the turn. For this reason, we will explore the possibility that these examples feature one and the same melodic construction, and that this construction contains an LHL sequence of tones subject to a non-trivial pattern of tonal–metrical association.
ToBI-style transcriptions illustrating a tentative analysis of the outcome of this tonal association pattern are provided in the examples in (3).
(3) ToBI-style transcription of the three Spanish utterances presented in Figure 2
Whereas the phrase es que de eso se es tá ha blando ‘That's what's being discussed’ in example (3b), which has four lexically-stressed syllables and therefore four prosodic words, features a prenuclear L* pitch accent and an H* nuclear accent,Footnote 5 the utterances claro ‘of course’ in (3a) and de educa ción ‘of education’ in (3c), which both consist of one prosodic word, have a nuclear L* pitch accent with a HL% edge tone sequence. The lack of a clear low accentual pitch target in the last syllable of the word educa ción [edukaθjon] in Figure 2c could be attributed to the microprosodic effects of the segments [θ] and [j], as well as to tonal coarticulation between L* and H within the same syllable.
If this analysis is correct, this would mean that the low–rising–falling melodic construction in Spanish not only has a variable nuclear accent depending on phrase length (the number of prosodic words) – a feature shared with the other two melodies presented in the previous subsection – but also that these same changes in phrase length can lead to its edge tone alternating between a very clear high target followed by a slight fall (HL% in AM notation), and a clear low target (L%). Thus, the phenomenon that we describe here differs from familiar cases of tonal crowding, in which f0 targets can be slightly displaced or undershot when under time pressure, for instance when the nuclear syllable is final in the phrase (Grabe et al. Reference Grabe, Post, Nolan and Farrar2000, Hanssen, Peters & Gussenhoven Reference Hanssen, Peters and Gussenhoven2007, Rathcke Reference Rathcke2016). The alternation described here between (3a) and (3b) involves alternations in the association properties of tones despite the fact that the nuclear syllable is non-final in the phrase, as both have a postnuclear unaccented syllable. In the next section, we will test this hypothesis in an imitation and completion experiment eliciting utterances with an equivalent pragmatic function but differing in length.
Because our hypothesis predicts that the melodic alternation sketched above is a specific property of the low–rising–falling melody as it is employed in Spanish, and in all likelihood not immediately available to speakers of other languages, we will test both native speakers of Spanish and Italian L2 speakers of Spanish who have not attained native-like intonational proficiency. More particularly, we predict that, when led to produce the low–rising–falling melody in statements of the obvious, Spanish speakers will alternate between low–rising–falling nuclear contours (i.e. L* HL%) in short phrases of one PW and falling nuclear contours (H* L%) in longer phrases, whereas Italian L2 speakers of Spanish will produce similar nuclear contours regardless of phrase length, since even near native speakers have been shown to lack full mastery of the intonational system of a second language (Mennen Reference Mennen2004, Reference Mennen, Delais-Roussarie, Avanzie and Herment2015). If, on the other hand, the variability in nuclear contour shape described above results from general phonetic mechanisms of tonal realization such as tonal coarticulation and undershoot (see Gandour Reference Gandour1994, Arvaniti & Ladd Reference Arvaniti and Ladd2009, Arvaniti Reference Arvaniti, Cohn, Fougeron and Huffman2012) native speakers of Spanish and Italian L2 speakers of Spanish will pattern together in their productions of nuclear contours across different phrase lengths.
2 Investigating the Spanish low–rise–fall in the laboratory: An imitation-and-completion experiment
Five native speakers of Spanish and five Italian L2 speakers of Spanish took part in the experiment. The Spanish participants consisted of four females and one male. All of them were monolingual, had grown up and were currently living in the town of Cádiz (in southern Spain), were between 21 and 50 years of age, except for participant AF, who was 10. All the Italian participants were female. They came from different parts of Italy (Rome, Bologna, Treviso, and Venice), and were all living in Nijmegen (The Netherlands) at the time of the experiment. Their proficiency in Spanish was variable. All of them except one had a noticeable foreign accent. Speaker SI, who had lived in different parts of Spain for eight years before moving recently to the Netherlands, was highly proficient in Spanish.
The Spanish participants were recorded in a quiet room in their homes in Cádiz (Spain), whereas the Italian participants were recorded in a soundproof booth at the Max Planck Institute in Nijmegen (The Netherlands).
The experiment consisted of an imitation-and-completion production task aimed at eliciting the low–rising–falling melodic construction in Spanish in utterances differing in length (i.e. number of prosodic words). It comprised a training phase followed by a test phase. In the training phase, participants were asked to imitate a series of statements of the obvious consisting of two intonational phrases containing one prosodic word each. The first phrase always featured the interjection claro ‘of course’ produced with a low–fall–rise, and was used to unequivocally elicit the target speech act (i.e. statement of the obvious). The second phrase featured a variable noun phrase with the same melody (e.g. Claro, el abuelo ‘Of course, the grandfather’).
During the training phase, therefore, participants were expected to become aware that each of the two phrases of each item throughout the experiment should always exhibit the same melody, and have the same pragmatic force regardless of its lexico-syntactic content. Each training item (see Table 1) was presented first in written form for 1.5 seconds, and then auditorily in a synthetic realization produced with a Nuance Vocalizer™ European Spanish voice (as available in the Mac OS X operating system v. 10.3). The pitch of the synthetic speech was manipulated in Praat (Boersma & Weenink Reference Boersma and Weenink2016) so that the utterances featured a similar low–rising–falling contour in utterances of one PW. The exact pitch pattern used in the resynthesis, which is shown in Figure 3 (audio clip provided in the supplementary materials), was modeled after a natural production by the first author (a native speaker of Andalusian Spanish, a variety of Spanish spoken in southern Spain). It features a slight fall to a low flat pitch target within the lexically-stressed syllables [kla] and [βwe], followed by a rise to a high target and a slight terminal fall in the last unstressed syllable of each phrase.
The purpose of the test phase was to see whether participants would produce the alternating melodic patterns conditioned by phrase length described in Section 1.3 above. In this phase, participants encountered items similar to the ones previously presented in the training phase (i.e. always starting with the word claro), but this time differing in the number of prosodic words in the second phrase, ranging from one to three (e.g. one PW: Claro, Ma nolo vs. two PWs: Claro, el her mano de Ma nolo ‘Of course, Manolo's brother’ vs. three PWs: Claro, la a miga del her mano de Ma nolo ‘Of course, Manolo's brother's friend’). In each target item, only the word claro was presented both in auditory and written form as in the training phase, but the remainder (e.g. El hermano de Manolo ‘Manolo's brother’) was presented in written form only. Crucially, participants were asked to imitate and complete each test item as naturally as possible. The pool of target test items (see Table 1) was presented to each participant five times, each with a randomized item order. Note that all items contain the word Manolo in nuclear position, allowing a perfect comparison of the nuclear contours of items differing in phrase length. To ensure that participants keep using the same melodic construction throughout the test phase despite the numerous repetitions, each consecutive pair of target items was separated by one randomly selected item taken from the training phase. These fillers were presented in their complete written and auditory form, which participants were asked to simply imitate as they had done during the previous training phase.
Pitch contours were extracted from the recorded materials using the auto-correlation pitch detection function in Praat in semitones (reference 100 Hz), and were smoothed with the smooth.spline() function in R (R Core Team 2016) using a smoothing parameter of 0.7. All absolute pitch values were normalized by subtracting from them the median pitch value of their speaker as calculated from their complete recording session. The first author annotated the start and end of each target utterance, as well as the start and end of all lexically stressed syllables, using time-aligned waveforms and spectrograms.
To quantify the shape of nuclear contours, we approximated the nuclear contour of each utterance with a quadratic polynomial function a + bx + cx2, where a and b correspond, respectively, to intercept and slope, and the c coefficient corresponds to the curvature of the fitted function (see Andruski & Costello Reference Andruski and Costello2004, Torreira Reference Torreira, Trouvain and Barry2007). Rising–falling contours were fitted by functions with negative curvature coefficients, where falling–rising contours led to positive curvature coefficients. To test for statistical differences in pitch contour shape between the various experimental conditions, we ran mixed-effects linear regression models on the curvature coefficients using the lme4 and lmerTest packages in R.
Figure 4 shows the pitch contours in the final prosodic word of each target test item split by language group and phrase-length condition, whereas Figure 5 shows the entire contours pooled by speaker and phrase length within each language group (audio clips of speaker DG's utterances are provided as part of the supplementary materials; all of this project's experimental data, materials, and scripts can be found on the project's page on the Open Science Framework at https://osf.io/eas68/). In utterances of one prosodic word only, both groups produced low–rising nuclear contours (i.e. stretching from the last stressed syllable of the phrase to the end of the phrase). In the longer phrases, on the other hand, Spanish speakers produced clear nuclear rise–falls, whereas Italian speakers produced low–rising nuclear contours similar to the ones in the one-PW condition.
To test for the statistical significance of the observed differences in nuclear contour shape, we fitted a regression model with the polynomial quadratic coefficients of the nuclear part of the falling–rising–falling contours as the response, phrase length and language group as main predictors, and speaker as a random factor. There was a statistically significant interaction between length and language (F(2,135.95) = 349.44, p < .0001). Whereas for Italian speakers all length groups have a similar degree of curvature, in Spanish, phrases of two and more PWs had a nuclear contour with significantly more positive curvature than phrases of one PW (two PWs: β = –452.46, t = –23, p < .0001; three PWs: β = –448.17, t = –22.78, p < .0001). Releveling the factor length with two PWs as the new baseline showed that the native-speaker nuclear contours in phrases of two and three PWs had no evidence of differing from each other (β = 4.28, t = –0.21, p = .82).
The prenuclear part of the contours, shown in Figure 5, displays some differences across the two groups of speakers. Whereas Spanish speakers often initiated their phrases with a non-low pitch value in the first unstressed syllable of the phrase, and reached a low pitch target in all prenuclear stressed syllables, Italian speakers were not consistent in producing low prenuclear accents, occasionally producing high or slightly rising prenuclear accents (see, for instance the productions of EN and VM).
In summary, the results of our experiment clearly indicate that native Spanish speakers systematically alternated between low–rising nuclear contours in short phrases of one PW and rising–falling nuclear contours in longer phrases, whereas Italian speakers produced similar nuclear contours regardless of phrase length. Since the nuclear material in the items was always the same across the different phrase lengths and language groups (i.e. the last two syllables of the name Manolo: /no.lo/), we can exclude the possibility that the alternation in nuclear contour shape exhibited by native speakers of Spanish is due to general phonetic principles of tonal realization such as those present in cases of tonal crowding and under extreme temporal pressure. Finally, the fact that Spanish speakers produced low–rising nuclear contours lacking the slight terminal falls present in Figures 2 and 3 suggests that complete tonal truncation of the final L target is possible in this melody.
In the previous subsection we have provided evidence that the low–rising–falling melodic construction in Spanish is instantiated by nuclear accents and edge tones that differ in their tonal properties as a function of phrase length. Thus, as far as this melody is concerned, neither the pitch accent nor the nuclear contour can be construed as an intonational morpheme. We will now propose an autosegmental-metrical model representing the knowledge that a native speaker of Spanish must acquire in order to generate the low–rising–falling melodic construction in utterances of different lengths. The model includes a tonal tier containing the sequence LHL, a metrical tier contingent on the utterance to be spoken, and the following tonal–metrical association principles:
• The initial tone (L) associates with the first stressed syllable of the phrase, and, in phrases longer than one prosodic word, spreads rightwards up to the last stressed syllable of the phrase. In phrases of one prosodic word only, the condition is not met for spreading, there being only one stressed syllable.
• The second tone (H) associates either, in phrases longer than one prosodic word, with the last stressed syllable of the phrase, or, if the phrase has only one prosodic word, with the right edge of the phrase.
• The final tone (L) associates with the right edge of the phrase.
The phonological forms resulting from this model will be subject to phonetic implementation processes such as coarticulation and reduction just like any other phonological structure. We have observed, for instance, that tonal crowding in phrase-final syllables typically leads to the final L tone being partially or completely truncated. The bottom example in Figure 2 also suggests that the initial L tone can be undershot when it associates with the same syllable as the two other tones in the melody (i.e. in the phrase de educa ción ‘of education’, with a stressed syllable in final position).
The application of this model to utterances of different lengths is illustrated in Figure 6, where it is presented below two schematic pitch contours. The tonal level contains the tonal sequences to be associated. On the metrical level, curly brackets represent phrase edges, whereas stars represent lexically-stressed syllables serving as potential tonal anchors. Lines linking the two levels represent direct tonal–metrical associations, and the right arrow represents tonal spreading. The figure also includes ToBI-style transcriptions above the pitch contours for comparison with current models of Spanish intonation.
Note that ToBI-style transcriptions such as the ones included in the figure, while able to encode important phonetic aspects of the pitch contours in Figure 6, do not capture the similarities between the two utterances, nor can they be used to generate new utterances of varying length. For this reason, it could be argued that although ToBI labels of this kind provide a useful approximation of the phonetic form, they do not represent the phonological structure of this melodic construction. We return to this point in Section 4.
3 Investigating a second melody in the laboratory: The circumflex contour
We now discuss a further Spanish melodic construction that appears to have alternating nuclear accents and edge tones depending on phrase length. This melody, a circumflex contour typically found in questions with attribution of the proposition to someone other than the speaker, usually the hearer, as in ‘Who do you mean? X?’, where X is a referent in a thought attributed to the interlocutorFootnote 6 (see Escandell-Vidal Reference Escandell-Vidal, Rouchota and Jucker1998), has been previously described in its full form, that is, when it is produced in long phrases containing more than one lexically-stressed syllable (Estebas-Vilaplana & Prieto Reference Estebas-Vilaplana, Prieto, Prieto and Roseano2010: 32).Footnote 7 In such cases, it can be characterized as an initial accentual rise followed by a level pitch up to the last stressed syllable of the phrase, where the pitch rises again and then falls. Based on the linguistic intuitions and impressionistic observations of the first author (a native speaker of Andalusian Spanish), we hypothesized that the upstepped nuclear rise–fall in the full form of this melody, transcribed as L+¡H* L% in Estebas-Vilaplana & Prieto Reference Estebas-Vilaplana, Prieto, Prieto and Roseano2010), would alternate with a simple rise in short phrases of one prosodic word. To test this hypothesis, we ran a second imitation-and-completion experiment using the same procedure and speakers (native Spanish and Italian L2 speakers of Spanish) as in the experiment described in Section 3. To elicit the pragmatic context appropriate for this melody, we modified the items in Table 1 above by replacing claro ‘of course’ with ¿Quién? ‘Who?’, and by adding question marks to the target utterances, thus biasing the interpretation of the target utterances towards attributive questions (Table 2).
As auditory models for the training phase, we used resynthesized utterances with pitch contours similar to the one presented in Figure 7 (audio clip provided in the supplementary materials).
Figure 8 below shows the nuclear pitch contours collected in this experiment pooled by language group and phrase length, whereas Figure 9 shows the complete pitch contours pooled by speaker and phrase length within each language group (audio clips of speaker DG's utterances are provided as part of the supplementary materials). As predicted by our hypothesis, the five native speakers of Spanish systematically produced simple nuclear rises in short questions of one PW, and upstepped nuclear rise–falls in the longer phrases of two and three PWs. The Italian L2 speakers, on the other hand, generally produced nuclear rise–falls in all of the three phrase length conditions. Interestingly, Italian speaker SI, who had lived in Spain for several years prior to the experiment, produced native-like rising contours in the one-PW condition in two of her productions.
To test for the statistical significance of the observed differences in nuclear contour shape, we fitted a regression model with the polynomial quadratic (i.e. curvature) coefficient as the response, phrase length and language group as main predictors, and speaker as a random factor. We observed a statistically significant interaction between length and language (F(2,135.95) = 14.13, p < .0001). Whereas for Italian speakers none of the length groups was different from each other, in Spanish, phrases of two and three PWs have a nuclear contour with more negative curvature (two PWs: β = –122.11, t = –4.27, p < .0001; three PWs: β = –139.29, t = –4.87, p < .0001) than phrases of one PW, which had curvature coefficients close to 0. Releveling the factor prosodic length with two PWs as the new baseline showed that phrases of two and three PWs did not differ from each other significantly (β = –17.17, t = –0.6, p = .55). It appears, then, that the data were consistent with our hypothesis concerning the alternation of nuclear contours and phrase length in the native speakers’ productions, and the lack thereof in the L2 speakers.
3.3 An autosegmental-metrical model of the circumflex melodic construction in Spanish
We now propose an autosegmental-metrical model of the circumflex contour that can account for the melodic alternation discussed in the previous subsections. Under this account, we will posit that the melody consists of a LH¡HL tonal sequence in which the second high tone is upstepped (¡H).Footnote 8 The tonal–metrical association principles of this melody are the following (an application of the model is shown in Figure 10):
• The initial LH tones always associate with the first stressed syllable of the melody. In phrases of more than two PWs, the initial H tone spreads rightwards up to the last stressed syllable of the phrase.
• In phrases longer than one PW, the upstepped ¡H tone associates with the last stressed syllable of the phrase. In short phrases of one PW, the upstepped H tone associates with the right edge of the utterance.
• The final L tone associates with the right edge of the phrase if the phrase is longer than one PW.Footnote 9 Otherwise the final L is left unassociated.
Note that, as in the case of the two other Spanish melodies discussed in this paper (the rising–falling broad focus declarative contour and the low–fall–rise used in declaratives to signal obviousness), it is the initial accentual movement that is always realized (i.e. both in short phrases of one PW and in longer phrases), rather than the final one, as assumed in analyses that partition the contour into prenuclear and nuclear constituents. Note also that, as in the other two melodies discussed, there is a rightwards spreading of the initial tonal material leading up to the final stressed syllable in phrases of two or more PWs (i.e. L in low–rise–falls, H in rising–falling declaratives and in circumflex contours). Further research is needed to determine the extent to which such recurrent principles apply to other melodic constructions in Spanish.
4 Summary and general implications for intonational phonology
In this paper we have provided evidence that a number of form–function relations in the intonation of Spanish can be analyzed by means of melodic constructions, which are specified in the Spanish intonational lexicon-grammar. We have seen that, in the case of the low–rising–falling and circumflex melodies, one of the tones in each melody can be seen as having different roles in the intonational phonology: delimitative, as a final edge tone, or culminative, as a nuclear pitch accent, depending on whether the melodic construction is produced in a short utterance of one PW only, or in a longer utterance of two or more PWs. Moreover, we have also shown that a construction may contain a number of tones, only some of which are realized in short utterances, indicating that certain tones are prioritized in cases where the number of stressed syllables (and thus tone bearing units) is limited.
Since some tones in the described melodic constructions surface as either pitch accents or edge tones, depending in a systematic way on metrical structure, we argue for a clearer separation of the tonal and metrical tiers in autosegmental-metrical intonational phonology, and propose a system of tiers with (i) a purely tonal tier consisting of tones, (ii) a metrical tier, and, crucially, (iii) melody-specific principles of tonal–metrical association. In this approach, tones with an intrinsic culminative function (i.e. pitch accents) and tones with an intrinsic delimitative function (i.e. boundary tones), or tones that are intrinsically nuclear, as in Ladd's (Reference Ladd2008) proposal, are not necessarily part of the underlying representation of the melodic construction, but emerge only once the tonal–metrical principles are applied.
From a cognitive point of view, it can be argued that melodic constructions such as the ones discussed here are stored as intonational elements in a unified lexicon-grammar or constructicon (see Kay & Fillmore Reference Kay and Fillmore1999; Jackendoff Reference Jackendoff2002; Goldberg Reference Goldberg2003, Reference Goldberg2006; Croft Reference Croft2016), including heterogeneous units traditionally assigned to separate levels of linguistic structure, such as syntactic structures, words, morphemes, idioms, and phonemes. In the case of melodic constructions, this involves storing the following three aspects of the construction when not fully predictable:
• Its melodic form, for instance in terms of level tones (e.g. LHL).
• Its principles of tonal–metrical association, which, as we have shown in the case of Spanish, may be sensitive to contextual factors such as phrase length and metrical structure. Note that, in languages where these principles are fully predictable, they need not be stored in each melody (in the same way that stress is not specified lexically in many languages).
• Its meaning specification, which may include links between each melodic construction and syntax and/or discourse, since certain melodies appear to occur with a specific meaning when coupled with specific syntactic constructions and/or in specific discourse contexts. For instance, the low–rise–fall in Spanish can be used in statements with a meaning of obviousness, and in vocative, imperative, and wh-question utterances with the meaning of request without involving any notion of obviousness as far as we can see (see example in the supplementary materials). Persson (this issue) argues that, in conversational repeats (i.e. when a speaker repeats some or all of the words in the interlocutor's previous turn), emphatic initial accents in French, traditionally thought to mark contrastive focus or emphasis, are produced by speakers to convey receipt of the interlocutor's talk, that is, without an emphatic or contrastive function (see also Muntendam & Torreira Reference Muntendam, Torreira, Armstrong, Henriksen and del Mar Vanrell2016, for a related phenomenon observed in an interactive task in Spanish). These examples suggest there may be links between discourse, syntax, and phonology in the speaker's cognitive representation of the melodic construction.
Since the L2 speakers in our experiments have difficulty imitating a melody if the number of prosodic words is different from the stimulus, they do not appear to have learnt the language-specific principles of tonal association. This is a case in which learners are not simply making a phonetic error, e.g. in alignment of the f0 peak, as discussed at length in Mennen (Reference Mennen2004, Reference Mennen, Delais-Roussarie, Avanzie and Herment2015), but they are making an error at the level of the phonology, in that they associate tones in a different way from the native speakers.
The existence of melodic constructions such the ones studied here does not mean that individual elements, such as smaller tonal units, principles of metrical association, and recurrent associations between units and metrical positions (such as that between L and phrase edges, or between H and metrically strong syllables) cannot also be represented independently in the language, in the same way that phonemic units and phonological processes (e.g. assimilation) can be represented independently at the segmental level of the lexicon-grammar. In languages with fully compositional intonational meaning, these units may be stored with links to pragmatic function and syntax, and idiom-like constructions can be stored as larger melodic constructions (see Calhoun & Schweitzer Reference Calhoun, Schweitzer, Elordieta and Prieto2012).
We have provided evidence of intonational meaning associated with melodies spanning the whole intonation phrase. The existence of such melodies is perhaps behind the fact that, for some languages (especially Romance languages), a fully compositional approach has never been proposed. Moreover, we have shown that the tones in these melodies do not necessarily have an intrinsic culminative or delimitative role (i.e. as either pitch accents or edge tones), and that the nuclear accent does not have a special status, in that it is not an obligatory part of the melodies examined.
The suggestions made here build on the ideas developed by Grice et al. (Reference Grice, Robert Ladd and Arvaniti2000), who introduced a degree of flexibility into the association of postnuclear tones, and on the proposal by Ladd (Reference Ladd2008), who posits abstract tones that are neither intrinsically pitch accents nor boundary tones, although in his proposal tones are intrinsically nuclear, prenuclear or postnuclear. The proposal here takes this flexibility one step further to allow for tones to be unspecified at the level of their association properties as well as their role in the prosodic hierarchy.
It is evident from the cases discussed here that only a model that treats tones and their association properties with a high degree of flexibility can account for our results. In this sense, the approach to intonation proposed here can be seen to reflect the true spirit of autosegmental-metrical phonology, in that it separates the tonal and metrical tiers, and uses context-dependent principles for their association.
We are greatly indebted to José Ignacio Hualde, Amalia Arvaniti, Bob Ladd, and Uli Reich for insightful discussions and comments on the manuscript. This work was supported in part by the Collaborative Research Centre 1252 ‘Prominence in Language’, and by an ERC Advanced Grant (269484 INTERACT) to Stephen C. Levinson.