Skip to main content Accessibility help


  • Access


      • Send article to Kindle

        To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        Input Optimisation: phonology and morphology*
        Available formats

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        Input Optimisation: phonology and morphology*
        Available formats

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        Input Optimisation: phonology and morphology*
        Available formats
Export citation


In this paper, I provide a unified account of three frequency effects in phonology. First, typologically marked elements are underrepresented. Second, phonological changes are underrepresented. Third, morphologically conditioned phonological changes are overrepresented. These effects are demonstrated with corpus data from English and Welsh. I show how all three effects follow from a simple conception of phonological complexity. Further, I demonstrate how this notion of complexity makes predictions about other phenomena in these languages, and that these predictions are borne out. I model this with traditional Optimality Theory, but the proposal is consistent with any constraint-based formalism that weights constraints in some way.



Thanks to Adam Albright, Elise Bell, Ricardo Bermúdez-Otero, Amy Fountain, Chris Golston, S. J. Hannahs, Lionel Mathieu, Diane Ohala and Maggie Tallerman for useful discussion. Thanks also to audiences at Manchester and Arizona and to the members of my 2015 Linguistics 514 class. Finally, thanks to three anonymous reviewers, an associate editor and the editors for additional feedback. All errors are my own.

1 Introduction

In this paper I show that there are particular frequency effects governing the mapping from input to output. I demonstrate that, while they appear to conflict with each other, a simple unified account is possible. For this demonstration, a generic version of Optimality Theory (McCarthy & Prince 1993, Prince & Smolensky 2004) is assumed, but the proposal is compatible with any constraint-based theory. I will provide a unified account for three statistical effects: (i) the underrepresentation of marked phonological elements, (ii) the underrepresentation of phonological changes and (iii) the overrepresentation of morphologically conditioned phonology.

The rarity of marked elements is well established. Typologically marked elements tend to be rarer than typologically unmarked elements in languages that have both. This applies both to marked elements and to marked configurations. The underrepresentation of phonological mappings between input and output is established by Hammond (2013): forms that undergo phonological changes between input and output are underrepresented with respect to forms that do not undergo changes. That there is overrepresentation of forms that undergo phonological changes conditioned by morphology is demonstrated by Hammond (2014). The latter paper provides the outlines of how this might be treated in the context of the underrepresentation effect. Here I put all these pieces together into an explicit account that also treats the typological effects and test it with a number of additional phenomena not previously treated.

The organisation of this paper is as follows. I begin with classical frequency effects in the domain of typological markedness, reviewing data from English. The general phenomenon is that marked elements are less frequent than unmarked elements. Next, I turn to similar effects in the domain of phonological mapping, again using data from English. I show that phonological changes (qua faithfulness violations) are underrepresented in comparison with non-changes. In §4, I show that consonant mutation in Welsh exhibits the opposite skewing: changes induced by consonant mutation are overrepresented compared with non-changes. I next consider a variety of corpus data from English and Welsh, demonstrating that it is the morphological aspect of consonant mutation that causes this apparent different behaviour, and provide an account of this difference. Finally, I conclude with a review of the general empirical results, the theoretical claim, remaining questions and directions for future research.

2 Typological markedness

In the following, I take typological markedness as an opposition between two elements a and b cross-linguistically. The element a is typologically marked with respect to b just in case a does not occur in a system unless b is there. In other words, the presence of a in a language implies the presence of b: ab (Hammond et al. 1988).

It is well-known that typologically marked elements tend to be less frequent than unmarked elements in the phonological systems that actually contain them. 1 For example, [d] is more marked typologically than [t] and, in systems that have both, [d] tends to be less frequent. 2 Marked phonological elements and configurations are avoided in surface/output representations (Jakobson 1968).

We can see this effect in English with word-initial coronal stops using the Brown corpus (Kučera & Francis 1967). 3 Voiced stops are more marked than voiceless stops typologically. This is evidenced by the number of languages that have voiceless stops, but not voiced stops, and the virtual absence of languages with voiced stops, but not voiceless stops. Focusing, for convenience, on word-initial position, what we find is that, in English, voiced stops are observed more rarely than voiceless stops. More specifically, if we assume they should be equally frequent, the occurring distribution is significantly different, as shown in Table I. 4

Table I Distribution of word-initial [t] and [d] in the Brown and Buckeye corpora. The distribution is significant: Brown χ2(1, N = 79988) = 4752.863, p < 0.001; Buckeye χ2(1, N = 22389) = 326.330, p < 0.001.

One might doubt a comparison based on a written corpus, but, as also shown in Table I, we find the same effect with the spoken Buckeye corpus (Pitt et al. 2007), which has 284,732 words.

There are similar effects with phonotactic or contextual markedness. For example, consonant clusters are more marked than singletons cross-linguistically; if a language has clusters, it will necessarily have singletons, but not vice versa. Correspondingly, if a language has clusters, they will be less frequent than the corresponding singletons. For example, English word-initial singleton [d] is more frequent than word-initial [dC] clusters in both the Brown and Buckeye corpora, as shown in Table II.

Table II Distribution of word-initial [dV] and [dC] in the Brown and Buckeye corpora. The distribution is significant: Brown χ2(1, N = 30079) = 20906.810, p < 0.001; Buckeye χ2(1, N = 9751) = 8290.235, p < 0.001.

Prince & Smolensky (2004) show that a framework like OT can accommodate systemic markedness, i.e. implicational generalisations of the form: if a language has [d], it will also have [t]. The explanation for this comes from the claims that: (i) there is a universal set of constraints, and (ii) these constraints can interact only via strict ranking. On the assumption that we have a faithfulness constraint Faith and a markedness constraint *d, it follows that only the two kinds of phonological system in (1a, b) are possible.

  1. (1)

One ranking gives us (1a), the other gives us (1b), but there is no ranking of these two constraints that will produce (1c).

However, orthodox OT provides no direct account of statistical markedness. We turn to this in the following section.

3 Phonological changes

The distributional patterns discussed in the previous section extend to other parts of the phonology. Specifically, the same kinds of skewings apply at the phrasal level and to input–output mappings.

Marked phonological configurations can be repaired phonologically as well. These changes are also statistically avoided. An example is the Rhythm Rule (Liberman & Prince 1977, Hammond 1984, Hayes 1984). 5 The Rhythm Rule refers to the phenomenon whereby a primary stress in English is shifted leftward onto a preceding secondary stress if it would otherwise occur too close to a following stress. These two factors, i.e. clash and the presence of a preceding secondary stress, are separated in (2).

  1. (2)

In (2a) we see stress shifting leftward because the primaries are too close. In (2c) we see no shift, because there is no preceding secondary to shift the primary to. In (2b) and (2d) we see no shift, as the stresses are not close enough.

Hammond (2013) demonstrates that cases (a) and (c) are statistically underrepresented, using the tagged Brown corpus and the CMU pronouncing dictionary. 6 The basic idea is to compare the distribution of these items in environments where the Rhythm Rule applies with those where it doesn't. It's a little complex to do this, because stress isn't marked in the tagged Brown corpus. It's also difficult because the environments where shift occurs depend on whether the relevant item is in a syntactic phrase with the following item and the stress of the first item is close enough to that of the second. Following Hayes (1984), I assume that stress shift aims for four-syllable intervals; hence two-syllable modifiers will be in the appropriate stress configuration if the following word has a stress on the first or second syllable. This is, of course, always true in English (e.g. Chomsky & Halle 1968). The syntactic environment is approximated by comparing prenominal environments to all others. This isn't exact. For example, we might expect adjectives before other adjectives to constitute a Rhythm Rule environment, and our search strategy groups these incorrectly. The idea is that the prenominal examples will be dominated by appropriate syntactic configurations for the Rhythm Rule, and examples of the second non-prenominal sort less so. This certainly isn't perfect, but it avoids having to do a full syntactic parse.

There are 1,161,192 words in Brown and 127,008 words in CMU. There are 64,028 adjective tokens in Brown and 8063 adjective types. Of these, 4049 occur in the CMU dictionary, of which 1281 are disyllabic and can be analysed. 7 Table III gives just the general pattern. As we might expect, there are a lot more trochaic adjectives than iambic, and a lot more words with a single stress than two stresses, as in Table III.

Table III Distribution of stress in two-syllable adjectives in the Brown corpus. (The adjectives also occur in the CMU dictionary.)

If we break these up into prenominal vs. non-prenominal tokens, we get Table IV.

Table IV Distribution of stress in two-syllable prenominal (vs. elsewhere) adjectives in the Brown corpus. (The adjectives also occur in the CMU dictionary.) The distribution prenominally is significantly different from that non-prenominally (χ2(1, N = 7988) = 270.205, p < 0.001).

This can be made more precise though. Two distributional patterns are important here. First, the distributions of items like happy and aloof are significantly different with respect to prenominal and non-prenominal environments. In prenominal position, words like aloof represent 8% of adjectives with no secondary stress, while in non-prenominal position they account for 13%. This shows us that unresolvable clash, a marked configuration, is underrepresented, as in Table Va.

Table V Separating the distributions for (a) unresolvable and (b) resolvable stress configurations with prenominal adjectives in the Brown corpus. The differences are significant: (a) χ2(1, N = 7755) = 231.300, p < 0.001; (b) χ2(1, N = 233) = 34.290, p < 0.001.

Second, the distributions of items like finite and unknown are significantly different across prenominal and non-prenominal environments as well, as in Table Vb. In prenominal position, words like unknown represent 32% of adjectives with secondary stress, while in non-prenominal position, they account for 49%. This shows that resolvable clash is also underrepresented.

The Buckeye corpus shows the same general pattern. I first tagged the corpus with the Stanford part-of-speech tagger (Toutanova et al. 2003). 8 The procedure was then the same as above, and yielded the basic distribution in Table VI.

Table VI Distribution of stress in two-syllable adjectives in the Buckeye corpus.

Prenominally vs. elsewhere in the Buckeye corpus, we find a similar distribution to what we saw in the Brown corpus, as in Table VII.

Table VII Distribution of stress in two-syllable prenominal (vs. elsewhere) adjectives in the Buckeye corpus. The distribution prenominally is significantly different from that non-prenominally (χ2(3, N = 2226) = 71.140, p < 0.001).

Overall in the Buckeye corpus, the distribution prenominally is significantly different from that non-prenominally, just as in the Brown corpus.

As with the Brown data, two distributional patterns are important here. First, the distributions of items like happy and aloof are significantly different with respect to prenominal and non-prenominal environments. In prenominal position words like aloof represent 4% of adjectives with no secondary stress, while in non-prenominal position they account for 9%. This shows that unresolvable clash, a marked configuration, is also underrepresented in the Buckeye corpus, as in Table VIIIa.

Table VIII Separating the distributions for (a) unresolvable and (b) resolvable stress configurations with prenominal adjectives in the Buckeye corpus. The differences are significant: (a) χ2(1, N = 2172) = 66.731, p < 0.001; (b) χ2(1, N = 54) = 4.360, p < 0.037.

Second, as in Brown, the distributions of items like finite and unknown are significantly different across prenominal and non-prenominal environments, as in Table VIIIb. In prenominal position words like unknown represent 19% of adjectives with secondary stress, while in non-prenominal position they account for 32%. Resolvable clash is therefore also underrepresented in both corpora.

What we see then is that both unrepairable clash and repaired clash are underrepresented, in the written corpus as well as the spoken corpus. This means that there is more going on than just the avoidance of marked elements and configurations; phonological repair is also avoided.

Other explanations for these skewings are, of course, possible. One might suppose that the distribution of the four classes of adjectives is accidentally connected to the semantics, and that trochaic adjectives tend to have meanings more appropriate for prenominal position while iambic adjectives tend to have meanings more appropriate for other positions. There are at least three reasons to reject this kind of approach as an explanation. First, showing that there is a statistical correlation between semantic or syntactic categories and phonological properties is not itself an explanation. What we need is some explanatory principle and/or some grammatical mechanism that makes the connection necessary, and allows it to follow from general principles. Second, appeal to accidental semantic or syntactic biases is not a unified account. The account developed here involves a single explanatory principle that covers all cases. Finally, the account developed here is not only unified, but also sensible. It extends existing grammatical machinery in a straightforward way, rather than appealing to accidental semantic facts. 9

4 Morphological processes: mutation

In this section I turn to a rather different phenomenon, and show that Welsh mutation exhibits the opposite distribution from the English cases.

Let's review the general pattern. Welsh has three basic mutations. These are a class of consonantal changes that take place word-initially in a morphosyntactically prescribed set of environments. I focus on soft mutation, which involves the changes in (3).

  1. (3)

Other consonants do not change in this environment. I call the changing consonants mutators; [f s χ n], etc. are non-mutators.

The examples in (4) show how this works. In (a), a feminine singular noun mutates after the definite article, and in (b) we see that an adjective modifying a feminine singular noun will also mutate. The object of certain prepositions mutates (c), as does the direct object of an inflected verb (d).

  1. (4)

Hammond (2014) demonstrates that Welsh mutation displays the opposite effect from what we saw in the previous section. This can be seen in the environment following prepositions that trigger soft mutation vs. all other environments. As mentioned above, certain prepositions, including those in (5), induce soft mutation in the following word.

  1. (5)

The CEG corpus (Ellis et al. 2001) is a publicly available tagged corpus of written Welsh containing 1,223,501 words. In addition, it gives the lemma form for all tokens. In this corpus mutators constitute 21% of the total in other environments, but after prepositions that trigger soft mutation they form 31%, as in Table IX.

Table IX Distribution of words beginning with mutatable consonants (vs. others) after mutating prepositions (vs. other environments) in the CEG corpus. The difference is significant: χ2(1, N = 98184) = 5542.824, p < 0.001.

This means, that while we avoid both unresolvable and resolvable configurations in English stress clash, the opposite is true for Welsh soft mutation.

This is surprising, so let's make sure that it is correct. Personal names in Welsh do not undergo any of the mutations, as shown in (6a). This is not true for native and nativised geographic names, which can undergo the mutations, e.g. i Fanceinion [i vankejnjɔn] ‘to Manchester’ in (4c) above.

  1. (6)

Consider now how often personal names begin with mutatable consonants. If mutation is avoided – like rhythm and clash in English – we would expect names to begin with mutatable consonants more often than non-names. In fact, the opposite is the case, consistent with the reversal we saw above in mutation contexts for non-names: names are less likely to begin with a mutatable consonant, as shown in Table X. 10

Table X Distribution of names beginning with mutatable or non-mutatable consonants (vs. non-names) in the CEG corpus. The difference is significant: χ2(1, N = 27841) = 8027.046, p < 0.001.

We might be concerned that the patterns could be different in spoken language. In fact, we observe a similar distribution in a spoken corpus. The Siarad corpus (Deuchar et al. 2014) is a transcribed spoken corpus of approximately 607,450 words. 11 It is not tagged for part of speech, but the basic soft mutation comparison above can be approximated. I used only those prepositions triggering soft mutation that can be identified unambiguously, leaving aside i and o, which are ambiguous between preposition and pronoun. I then searched for all words that begin with sounds that unambiguously could either mutate or be mutated, setting aside vowel-initial words, since they can either be the mutated result of a [g]-initial word or a true vowel-initial word. This gives us the counts in Table XI, which can be compared to those in Table IX.

Table XI Distribution of words beginning with unambiguous mutatable consonants (vs. others) after unambiguous mutating prepositions (vs. other environments) in the Siarad corpus. The difference is significant: χ2(1, N = 6830) = 14.833, p < 0.001.

Words beginning with mutatable consonants are more likely after mutating prepositions. This difference is smaller than in the written corpus, but is also significant. 12 Hence we observe the same effect in the spoken register as well.

I conclude that mutation indeed exhibits the opposite distribution from the English cases considered in the previous section.

5 Analysis

In this section I provide an analysis for the facts considered above. Before proceeding, let us consider what has been established empirically.

First, underrepresentation of words like a′loof in prenominal position, [d] vs. [t] word-initially, [d] vs. [dr] word-initially, etc., shows that marked elements and configurations are statistically avoided. Second, underrepresentation of words like ˎun′known in prenominal position shows that the Rhythm Rule, a phonological change, is also avoided.

On the other hand, Hammond (2014) shows that there is overrepresentation of mutatable consonants in mutation contexts in Welsh, the opposite from what we saw in English. This was confirmed here by showing that non-names vs. personal names in Welsh in the CEG corpus and the spoken Siarad corpus show the same reversal.

The first two cases above look rather like Lexicon Optimisation, and it would be reasonable to try to build an account in terms of the machinery involved in that approach. 13 Prince & Smolensky's (2004: 225–226) original definition is given in (7).

  1. (7)

The basic idea is that if there are multiple ways to produce an output form consistent with the facts of a language, the input that produces the fewest constraint violations is chosen.

To see this in action, consider a simple example. Imagine we have nasal place assimilation, and a constraint against NC sequences with different place values which outranks the relevant faithfulness constraints. For heteromorphemic examples, we would have tableaux like (8).

  1. (8)

Here we have an input /n/ which is realised as [m] before a labial. Because the example is heteromorphemic, we can assume that there are other contexts – perhaps vowel-initial – where we can determine that the prefix-final consonant is indeed /n/. However, there are tautomorphemic cases where the input is unknown. An output form [lʌmp] is consistent with the inputs /lʌmp/ and /lʌnp/. Either input produces the same output, as in (9).

  1. (9)

In these cases, Lexicon Optimisation favours the input that produces the desired output most harmonically. We can see this in a ‘reverse tableau’, as in (10), where inputs are given along the left and the violations marked are those for the optimal candidate, given that input. 14 As far as possible, lexicon optimisation ensures that what you see is what you get.

  1. (10)

There are, of course, no empirical consequences to Lexicon Optimisation in itself. In fact, it is defined to apply only when there are no consequences. I examine now whether it is profitable to view the underrepresentations we see in English as statistical analogues to Lexicon Optimisation.

To accommodate the effects we saw in English, we need to expand the notion of lexicon optimisation to accommodate comparisons between inputs when the outputs are not the same. To do this, let's first define a notion of p honological complexity that applies to individual input–output pairings but also to entire phonological systems. (The basic logic of this is that the complexity of a phonological system is proportional to the number of asterisks in its tableaux.) We first define the output/surface forms of a language as a possibly infinite set, as in (11a).

  1. (11)

Every member of that set has a corresponding (optimal) input form (11b), and, for any phonology, there is also, of course a finite sequence or vector of constraints (11c).

Any input–output pairing ⟨I i , O i ⟩ (where angle brackets represent vectors) then defines a finite vector of violation counts, some number of violations for each constraint incurred by the winning candidate for that input, as in (12).

  1. (12)

With these notions, Phonological Complexity is defined as in (13).

  1. (13)

This can again be exemplified with our hypothetical nasal assimilation example. Let us assume the following set of forms whose PC we wish to compute. Given the inputs in (14), we have the constraint violations shown for the winning candidates.

  1. (14)

The relative complexity of this system is ⟨0, 6⟩/9 = ⟨0, 0.67⟩. We can compare the system in (14) with the one in (15). Here we have a different array of output forms, but the same logic for inputs and constraint violations.

  1. (15)

The relative complexity of this second system is ⟨0, 4⟩/8 = ⟨0, 0.5⟩. The second system is less complex than the first: ⟨0, 0.5⟩ < ⟨0, 0.67⟩. It would be reasonable to assume that more complex complexity vectors should be compared using the logic of strict ranking, for example ⟨0.9, 0.5⟩ > ⟨0.4, 0.67⟩.

In the example above, the relative magnitude of the higher-ranked constraint determines the relative complexity of the systems, rather than the relative magnitude of the lower-ranked constraint.

The proposal then is that all phonological systems are skewed to be less complex, as determined by (16).

  1. (16)

This alters the frequency of input–output pairings; it does not change the input representation of any particular form.

Let's examine each of the English cases. For word-initial [t] vs. [d] we assume there is a constraint penalising voiced stops: *VdStop. Imagine we have a sample of 100 words that begin with coronal stops with the distribution in (17a).

  1. (17)

The total PC score is ⟨0, 50⟩, and the relative score ⟨0, 50⟩/100 = ⟨0, 0.5⟩. We can imagine a skewed distribution, of the sort we saw in English, but more extreme, like (17b). Here the total PC score is ⟨0, 25⟩, and the relative score ⟨0, 25⟩/100 = ⟨0, 0.25⟩. The latter distribution, with fewer word-initial instances of [d], is thus less complex. The actual occurring and expected distributions from the Brown corpus, along with relative PC scores, are given in Table XII.

Table XII Relative PC scores for word-initial [t] and [d] in the Brown corpus.

The same logic applies in the case of word-initial [d] vs. [dr], except that the relevant markedness constraint is *Complex. A distribution like (18a) is dispreferred to one like (18b).

  1. (18)

As in the previous pair, the relative PC score for the less preferred distribution is ⟨0, 50⟩/100 = ⟨0, 0.5⟩, while that for the preferred distribution is ⟨0, 25⟩/100 = ⟨0, 0.25⟩. The latter distribution, with fewer word-initial instances of [dr], is less complex. The actual distribution and relative PC scores for the Brown corpus are given in Table XIII.

Table XIII Relative PC scores for word-initial [dV] and [dC] in the Brown corpus.

Prenominal ′happy vs. a′loof works exactly the same way with respect to the markedness constraint *Clash. Here, the higher-ranked constraint is not a faithfulness constraint, since we know stress shift is generally possible in English, but a constraint that requires that if stress shifts, it shifts to a syllable that would otherwise bear secondary stress. For convenience, we call this Secondary. The distribution in (19a) is less preferred than that in (19b).

  1. (19)

The calculation is exactly the same. Actual values and relative scores from Brown are given in Table XIV.

Table XIV Relative PC scores for prenominal adjectives with unresolvable stress configurations in the Brown corpus.

Finally, consider the case of prenominal ′fiˎnite vs. ˎun′known. Here what is ruled out is application of the Rhythm Rule, not clash per se. We can assume that when stress shift applies, it violates some version of OO-correspondence, a constraint requiring stress in a clash context to be the same as stress in other contexts. That constraint, in turn, is dominated by *Clash, and of course by Secondary, as in (20).

  1. (20)

Table XV gives the true values and relative scores from the Brown corpus.

Table XV Relative PC scores for prenominal adjectives with resolvable stress configurations in the Brown corpus.

What about the Welsh examples? On the face of it, its looks as if Welsh is skewed so as to make its system more complex. Recall that in a mutation context, such as after a preposition like i, we find more instances of mutating consonants than in non-mutation contexts. Let's assume that there is a constraint that forces mutation in various environments; we can call it Mutate. This constraint outranks the relevant faithfulness constraint. We get exactly the wrong prediction when we consider the same two hypothetical distributions as in the previous cases. Compare mutating items like cath [kaːθ] ‘cat’ vs. non-mutating items like afal [aval] ‘apple’ after i. (21a) shows a neutral distribution, while what we would expect is fewer instances of constructions like i gath, as in (b) – we would then have ⟨0, 0.25⟩, rather than ⟨0, 0.5⟩. The problem is that we get just the opposite. In mutation contexts, we find more instances of constructions like i gath. Schematically, we have (21c), where we find ⟨0, 0.75⟩, rather than ⟨0, 0.5⟩, exactly the opposite of what is predicted by Input Optimisation (16).

  1. (21)

Actual values and relative scores from the CEG corpus are given in Table XVI.

Table XVI Relative PC scores for mutatable vs. non-mutatable initial consonants in the CEG corpus.

Why might Welsh mutation behave in this way? The difference is apparently that mutation is a morphologically conditioned phonological change, so it seems reasonable to build an explanation on that difference. We can accommodate this under the Input Optimisation rubric if, in fact, there is a constraint favouring the expression of morphological categories. The logic is that the reason why mutatable consonants are overrepresented where they are is because there is a constraint that demands that morphological categories be expressed.

The key point is that mutation, whether phonological, morphological or lexical, must be subject to a constraint forcing morphological categories to be expressed. If mutation is indeed a morphologically conditioned phonological change, there is no issue. Some researchers (e.g. Stewart 2004, Green 2006, Hannahs 2011, 2013) have argued that mutation systems should be treated morphologically or lexically, either in terms of some special class of morphological rules or in terms of listed allomorphs. If one of these is correct, then application of that morphological rule or selection of allomorphs must be subject to a constraint that requires morphology to be expressed. I will continue to describe mutation as a phonological process, but the general Input Optimisation account developed here is consistent with other views of mutation as well.

In fact, Kurisu (2001) proposes something close to what we need, in (22).

  1. (22)

Soft mutation expresses morphological information. To the extent that a word in a soft mutation context begins with a mutatable consonant, violations of RM are avoided. Thus when a form like cath [kaːθ] undergoes soft mutation to become gath [gaːθ], RM is satisfied. When afal [aval] does not change in a soft mutation context, RM is violated.

If we add RM to the constraint set for Welsh and rank it above Faith, this accommodates both Welsh cases. Consider first mutatable vs. non-mutatable consonants in mutation contexts, the schematic example just considered. In (23a), mutators and non-mutators are relatively evenly distributed (note that Mutate is here for completeness). RM forces the category to be expressed, and higher-ranked Mutate forces the precise expression of that category.

  1. (23)

The case in (23b) has proportionally more mutators. When relative PC is calculated with RM in the mix, we find the latter distribution is preferred: ⟨0, 0.5, 0.5⟩ > ⟨0, 0.25, 0.75⟩. This is, of course, also true for the actual distribution in the CEG corpus, where the occurring distribution ⟨0, 0.31, 0.69⟩ is preferred to the expected distribution ⟨0, 0.79, 0.21⟩. Notice that ranking, strict or otherwise, is key here. If RM is not ranked higher than Faith, we do not get the desired effect.

The effects of Input Optimisation are thus contingent on the ranking or weighting of the constraints in the language. Though the claim is that all languages will exhibit skewing to satisfy Input Optimisation, it does not follow that all languages will skew in the same way. Different weights or rankings will entail different skewings. Consider for example, the common loss of final syllables, even when they may be desinential, marking inflectional properties of the word in question. This is a purely phonological process that is not conditioned by the morphology. How is such a thing possible on the account here? Presumably there is a high-ranked/weighted constraint that favours the loss of such syllables and outranks RM. Input Optimisation will minimise violations of the higher-ranked/weighted constraints over those of lower-ranked/weighted constraints like RM. See §8 below for more discussion.

Consider now non-mutatable consonants in personal names vs. non-names: non-names begin with mutators more often than names do. If we take the distribution of mutators in names as the neutral distribution and the distribution with non-names as the distribution to be explained, this emerges directly: non-names have more mutators because that avoids violations of RM, just as in the examples considered above.

The RM constraint, however, is too restrictive. It would seem to imply that expression of a morphological category is minimal, that if it is already expressed elsewhere, there is no pressure to express it again. This in turn predicts that if mutation were to be triggered by an overt affix, then we should not see an overrepresentation effect. 15 In fact, such cases do occur in Welsh, and are predicted to show an overrepresentation effect as well.

There is a set of prefixes that trigger soft mutation in Welsh, e.g. cyn- [kɨn/kən] ‘ex-’, gor- [gɔr] ‘over-’, ail- [ajl] ‘re-’, di- [di] ‘-less’, hunan- [hɨnan] ‘self-’, is- [is] ‘sub-’, gwrth- [gurθ] ‘anti-’, cyd- [kɨd, kəd] ‘co-’, ad- [ad] ‘re-’, etc. The first three of these are exemplified in (24).

  1. (24)

The examples above include stems that begin with mutators and those that begin with non-mutators. What is the distribution? Is it similar to what we see after prepositions or to what we see elsewhere? To test this, I found all instances of these prefixes in the CEG corpus marked with a hyphen, and then did counts on the following stems.

One small complication is that a hyphen is not generally required for these prefixes. I chose to count the ones marked with overt hyphens, as it is of course easier to find these in the corpus. However, the hyphen is required just in case there might be an orthographic ambiguity. This occurs when the final letter of the prefix and the first letter of the stem could be misparsed as part of the digraphs ll [ɬ] and dd [ð]. Thus a form like ail-lenwi [ajllεnwi] ‘refill’ must be spelled with a hyphen to avoid the double letters being misinterpreted as *[ajɬεnwi]. Including items of this sort would bias our counts in favour of mutators, so they were excluded. (This slightly biases the count against mutators.) We find the distribution in Table XVII, which can be compared with the distribution of mutation in the non-preposition environment from the CEG corpus in Table IX. I take the latter to be the default.

Table XVII Distribution of prefixed stems beginning with mutatable vs. non-mutatable consonants in the CEG corpus, as compared with unprefixed items. The difference is significant: χ2(1, N = 1092) = 1976.534, p < 0.001.

The effect is so large that we might worry that something else is going on, e.g. that word-internal mutation is subject to other pressures not yet considered, but similar effects have been found in Welsh for plural suffixation and various associated stem-vowel changes (Anderson 2015). At this point, we must conclude that the pressure to express some morphological category via some phonological process is not contingent on whether that category might also be expressed elsewhere by an independent word, like a preposition, or by another morpheme. In the case at hand, the relevant morphological category is expressed by both a prefix, e.g. ail-, and soft mutation. What is key is that soft mutation doesn't apply to the prefix itself, but to the following stem. As it stands, RM would not enforce both operations, since the prefix and the mutation are both in the same word. The RM constraint must therefore be revised so as to allow this. The key is to restrict the notion of ‘morphological form’ in (22) to just a morpheme, as in (25).

  1. (25)

The revision is minimal, and accounts for all the cases treated so far, including the prefix example just considered. In the prefix case, there are two domains for RM′: the prefix itself and the stem. For a form like ail-fyw [ajlvɨw] above, we have ail [ajl] (*Ø) and fyw [vɨw] (*[bɨw]).

6 Confirmation

The solution developed in the previous section straightforwardly describes the cases we have considered, but relies on the assumption that it is morphology that behaves differently. It could just be that Welsh and English behave differently. In this section, this other possibility is ruled out by considering cases of morphologically triggered phonology in English and non-morphologically triggered phonology in Welsh.

Let's first look at an example in Welsh that is not connected to mutation. This example involves devoicing of voiced stops in the final coda of Welsh adjectives when they occur medially in comparatives and superlatives. The basic form of comparatives and superlatives is given in (26a), and (b) shows that if the stem ends in a voiced stop it devoices.

  1. (26)

This is an unusual process, the reverse of the more usual sort of voicing alternation one might see in an case like this, i.e. final devoicing. The historical analysis of these is that, at some point, the suffixes could be analysed as *-hax and *-hav and the devoicing we see here is the residue of the effects of the [h] (Morris Jones 1913). Regardless of the history, the synchronic analysis must include some constraint or set of constraints that force this devoicing, and our interest is in whether Faith violations are minimised here by Input Optimisation.

This is a non-morphological process, in the sense that it does not involve a particular morphological category. Specifically, the comparative and superlative are marked by affixes, and devoicing is simply restricted to certain morphological contexts. See §8 below for more discussion.

Let's now consider the distributions. 16 It turns out that word-final voiceless stops are extremely rare, so more accurate comparisons can be made if we use a different category as our comparison base: nasals. The CEG corpus is a written one, and there is an ambiguity in the Welsh orthography in terms of how to interpret ng (as [ŋ] or [ŋg]), so we only look at non-dorsals, comparing the distribution of stem-final [b d] with [m n]. In Table XVIII we see that voiced stops are underrepresented in comparatives and superlatives.

Table XVIII Distribution of stem-final [b d] and [m n] in unaffixed vs. comparative/superlative adjectives in the CEG corpus. The difference is significant: χ2(1, N = 205) = 12.269, p < 0.001.

This establishes that Welsh and English are not generally reversed. Hence Welsh adjectives behave like other English phonological examples.

We can look in the other direction as well. What about morphological cases in English? If the Input Optimisation with RM′ approach is correct, we expect them to behave like the Welsh soft mutation examples. English doesn't have anything like mutation, but does have morphological haplology (Stemberger 1981, Menn & MacWhinney 1984, Zwicky 1987). One example is the genitive plural in (27): the key fact is that overt plurals do not co-occur with the genitive.

  1. (27)

Another example is the adverbial suffix -ly in (28): the suffix is not added to an adjective that already ends in ly.

  1. (28)

What we find in the Brown corpus is precisely what we would predict under Input Optimisation with RM′: forms like cats’ in the genitive plural are statistically underrepresented, as shown in Table XIX.

Table XIX Distribution of genitive and non-genitive plurals in terms of overt suffixation in the Brown corpus. The difference is significant: χ2(1, N = 4200) = 232.399, p < 0.001.

Similarly, Table XX shows that adverbs are much more frequent with adjectives that don't already end in -ly in the Brown corpus.

Table XX Distribution of adjectives and adverbs in -ly in the Brown corpus. The difference is significant: χ2(1, N = 951) = 202.629, p < 0.001.

One final example can be added here: word-final t/d-deletion. This is a well-known phenomenon, initially studied by Guy (1991) and more recently by Turton (2012) and Coetzee & Kawahara (2013). The basic effect is that word-final [t d] can be deleted word-finally in English, e.g. in friend [frεnd, frεn]. The process is governed by a number of factors, including whether the [t d] appears in a cluster, whether the following word begins with a vowel, speech rate, informality, lexical frequency, etc. The relevant factor here is that the process applies less readily if it would delete a consonant that is the sole exponent of the -ed past tense. Thus, all else being equal, we expect deletion to apply more readily to a word like text [tεkst, tεks] than a word like boxed [bakst, baks].

This is indeed the case in the Buckeye corpus. Table XXI shows the relative retention of final [t d] as a function of whether the word in question ends in -ed.

Table XXI Distribution of [t d] deletion in suffixed vs. unsuffixed forms in the Buckeye corpus. The difference is significant: χ2(1, N = 4656) = 468.807, p < 0.001.

The facts of t/d-deletion are consistent with the account given here, and support the hypothesis that a skewing reversal occurs when RM′ would apply. We would expect deletion to be underrepresented just in case it would violate RM′, and that is what we see here. The Input Optimisation account is then an alternative to the rule-based and constraint-based stratal approaches of Guy (1991) and Turton (2012) respectively.

Hence adjective devoicing in Welsh, the genitive plural in English, adverbs in English and t/d-deletion in English work just as would be predicted if the relevant distinction is morphological expression vs. phonological generalisations.

Since adjective devoicing in Welsh is not a morphological operation like lenition, it does not incur violations of RM′. Therefore faithfulness violations are minimised, and we expect underrepresentation of forms that would otherwise undergo devoicing. The genitive plural in English is an overt affix, and thus clearly involves a morphological operation governed by RM′. Hence we expect underrepresentation of the haplological cases, as we find. Adverbs in English work the same way. RM′ favours expression of the adverbial suffix, so we expect to find underrepresentation of the haplological cases. In the case of deletion of [t d], we see a case where a normal phonological process is limited by RM′.

7 How does Input Optimisation work?

We have established a number of frequency effects that can all be unified and accommodated under the principle of Input Optimisation (16), but how does it work concretely? Here we address two questions. First, where does Input Optimisation take place? Is it a part of grammar, or something else? Second, wherever it may ‘live’, why doesn't it overpower the rest of the grammar? The ideas in this section are extremely speculative, but are intended to lay the groundwork for future research.

We need to clarify two important aspects of the proposal. First, Input Optimisation does not entail that all languages work the same way. We've seen that it works to minimise constraint violations across the language, and that it is sensitive to constraint ranking or weighting. Given that violations of higher-ranked or weighted constraints will be minimised over violations of lower-ranked or weighted constraints, and given that weight/ranking is at least partially language specific, it follows that the effects of Input Optimisation will differ across languages.

Second, Input Optimisation is a global effect, beyond the lexicon. We've seen a number of cases where Input Optimisation might be taken as an effect in the lexicon, some mechanism by which the number of words that fit some phonological requirement are more or less than expected. However, two facts militate against an exclusively lexical account. First, all of our counts have been from corpora, not dictionaries. That is, we are explicitly considering how often words and constructions are used, rather than how often words occur in a dictionary. Second, as just noted, we've also seen a number of cases where it is phrases or multi-word patterns that are skewed. Assuming that phrases are not generally listed lexically, this argues against attributing Input Optimisation exclusively to the lexicon. One might counter that the statistical combinatory properties of lexical items can be stored in the lexicon, and this is certainly true, but this amounts to extending our notion of the lexicon to include statistical syntactic properties.

Given that Input Optimisation extends beyond the lexicon, there are at least four ways we might think of it: (i) as an historical effect, (ii) as a property of acquisition, (iii) as a performance constraint or (iv) as evidence for a different kind of phonological architecture. The first two are related, as are the last two. I treat each of these four in turn.

Input Optimisation could be specifically a property of historical change. That is, there is pressure for historical change to selectively reduce the phonological complexity of the system as a whole. The basic idea is that Input Optimisation is a mechanism of historical change, and that the effects we have seen are not enforced by the grammar, but are the result of historical accretion. This is a reasonable approach. Historical change is often a by-product of the acquisition process, so we would have to carefully distinguish this from a purely acquisition-based account (see below). We would also need to think carefully about the phrasal skewings we've seen, and would have to allow for historical changes that change how often various words might co-occur.

Another possibility, related to the historical approach, is to view Input Optimisation as a property of acquisition. This approach assumes that the acquisition process is biased to minimise phonological complexity. Again, the effects we see would be a consequence of changes that occur during acquisition, not enforced by the adult grammar per se. If this were true, this would certainly have consequences in the historical domain, but we could in principle distinguish the two views. There are historical changes that occur in adult speech. If Input Optimisation were an acquisition effect, then we would expect those adult changes not to be biased by it, and we would also expect to see Input Optimisation imposed by the child during acquisition.

Yet another interpretation of Input Optimisation would be as a performance effect, in which the performance module filters the output of the grammar so as to satisfy Input Optimisation. Viewing performance as a filter begs questions of teleology, but these are the same questions begged by any theory that includes constraints on the output. We might distinguish this approach from the preceding ones with psycholinguistic experiments that tap into language processing, as opposed to grammatical structure. To the extent that we can determine different effects for the grammar and the performance system, and that Input Optimisation is localised to the latter, this would be evidence a view like this.

Finally, we might view Input Optimisation as part of the grammar itself. On this view, it would be an output condition on the entire grammar, as a general phonological sieve. This would require: (i) that the phonology itself be probabilistic in nature, an approach currently adopted in a number of areas of the field (see e.g. Boersma 1997, Hammond 1999, 2003, Coetzee 2008, Hayes & Wilson 2008, Pater 2009, Coetzee & Pater 2011), and (ii) that the phonology be able to constrain the syntax, morphology and lexicon of a language. This, of course, raises the same teleological questions as above, but they are again the same as any framework that includes constraints.

The data presented here do not distinguish among these choices, but hopefully it is clear what kinds of further empirical investigations might. Do we see effects of Input Optimisation in acquisition? Do we see effects of Input Optimisation in adult change? Can we distinguish Input Optimisation in competence vs. performance?

Let's now turn to the second question. Why does Input Optimisation not go all the way, eliminating any constraint violation? There are two reasons: constraint ranking (or weighting) and the overall functionality of the system.

In a system with weighted or ranked constraints, it may be impossible in some cases to minimise violations of one constraint without simultaneously maximising violations of another, as in (29).

  1. (29)

Here we might minimise candidates like y, maximising candidates like z. The effect would be a less complex system, but it would not be a system free of violations.

We can imagine other configurations though. Recall the hypothetical systems in (14) and (15). We saw how Input Optimisation would favour the second system over the first. The relative complexity of the first system is ⟨0, 6⟩/9 = ⟨0, 0.67⟩, and that of the second ⟨0, 4⟩/8 = ⟨0, 0.5⟩. If this is so, we might well imagine that the system could go even further, as in (30).

  1. (30)

Here no constraints are violated, so the system is the minimally complex: ⟨0, 0⟩. The effect is to reduce the inventory of nasals and stops in this environment to just those that do not violate NC or IO-Faith.

But a system that allows free rein to Input Optimisation is one where no constraints are violated; effectively only one word is possible, composed of maximally unmarked segments in an optimal prosodic and segmental configuration: [ta] (or something similar). The reason then that Input Optimisation does not have this effect is that it is offset by the need to have a sufficiently large set of morphemes and a sufficiently large array of combinatory possibilities to make communication possible. I therefore propose (31) as a counterforce to Input Optimisation.

  1. (31)

Conceptually, this does the trick, as it balances Input Optimisation against the functionality of the system. Clearly, however, though it captures the logic of the situation, it is still quite speculative. Turning this into something more concrete requires an investigation into the morphosyntax and semantics of a language. It would also be important to put it into explicitly quantitative terms, so it can be tested statistically. I leave this to further research.

8 Morphology and phonology

The RM′ constraint in (25) requires that we be able to distinguish morphological processes like Welsh mutation from phonological processes like English nasal assimilation. There are a number of ways we might do this, but (32) seems the clearest.

  1. (32)

Note that, on this definition, a morphological process is not simply one that has morphological conditioning. As we will see, a process might very well be restricted to some morphological context, and not meet the definition set by (32). The definition is then not about how the process might be formalised, but about what role it plays in the morphological system. Let's go through all the cases consider thus far and show how they fit or do not fit this rubric.

First, the English cases we considered in §2 involving segmental and phonotactic markedness do not qualify, because they are not morphologically restricted; hence they never mark some morphological category.

The English rhythm example treated in §3 also does not qualify, for the same reason. It is not morphologically conditioned, and thus never marks some particular morphological category. There is a different stress alternation in English that does sometimes mark morphology, the shift of stress to the left in the Latinate vocabulary when certain verbs undergo zero-derivation to become nouns, illustrated in (33) (Chomsky & Halle 1968, Hayes 1980, Kiparsky 1982).

  1. (33)

This is a different process, however. It only affects a small set of items of Latin origin, it only applies to nouns and it is not subject to the restriction that there must be a secondary to the left.

The Welsh mutation facts treated in §4 do qualify as a morphological process. Mutation is restricted to specific morphological environments, and there are environments where mutation is the sole marker of some morphological category. One environment for this is after the possessive ei ‘his, hers’. Without the optional following echoing pronoun, the sole marker of the gender difference is the mutation triggered by the possessive. In the case of the masculine form we have soft mutation, and in the case of the feminine we have aspirate mutation. Thus, for example, ei mam [i mam] can only mean ‘her mother’, since mam ‘mother’ does not undergo mutation. Similarly, ei fam [i vam] can only mean ‘his mother’, since mam does undergo soft mutation.

The final consonant devoicing treated in §6 does not qualify as morphological on this definition. While the process is restricted to particular morphological contexts, it never occurs without some other overt marker of that morphological context. The devoicing is never the sole marker of the comparative or superlative form.

The English haplology cases we saw in the same section are clearly morphological. These cases involve the presence or absence of a morpheme, which can be the sole marker of the respective morphological category, e.g. man vs. man's and wrong vs. wrongly.

Finally, the deletion of final coronal stops in English is clearly morphological in the sense intended when it deletes the past tense marker, e.g. look vs. looked.

There are, of course, other ways we might do this, but (32) is simple and captures the intuition that a process is morphological when, in at least some context, it affects whether some morphological category is expressed.

9 Conclusion

There are always alternative analyses available, and this is especially true for statistical analyses. The skewings observed above are consistent with any number of syntactic, lexical or semantic explanations. For example, the set of adjectives that can be made into comparatives or superlatives in Welsh could be semantically skewed. Alternatively, some of these skewings could be statistical accidents – patterns that are statistically unlikely, but have arisen by chance. The argument offered here is that we can unify all these under a single theoretical characterisation, rather than treating them as a collection of unconnected explanations and appealing to chance. In addition, our account makes clear predictions about other systems, predictions not made by an approach that treats these effects as unconnected or arising by chance.

The proposal in this paper does not come out of the blue. Similar ideas have been put forward in the literature, but none of these have the same empirical coverage as Input Optimisation.

One idea that bears some similarities is the idea that markedness correlates with number of violations (Golston 1998, Coetzee 2008). Input Optimisation takes this several steps further by allowing application of this to faithfulness, and by allowing it to alter distributions.

The notion of using Lexicon Optimisation to alter distributions is presaged in diachronic restructuring contexts by Bermúdez-Otero (1998).

The idea that the frequency of forms is governed by constraint weights is also pursued by Hayes & Wilson (2008). Their approach uses the distributions to fix the weights. The approach here uses the categorical phonology to determine the weights and then uses those weights to determine the distribution.

Input Optimisation is explicitly introduced in Hammond (2013, 2014). The former identifies the effect for phonological markedness and faithfulness; the latter first observes the challenge posed by Welsh mutation and suggests a solution using RM. In this paper, these ideas have been taken further by demonstrating that the empirical contrast between mutation and the initial English cases is indeed based on the morphological nature of mutation. This was done by analysing the English haplology examples, the Welsh stem-final devoicing examples, and English t/d-deletion. It has also been demonstrated here that RM must be revised as RM′, that some form of ranking is necessary to accommodate the RM′ examples and that PC must be assessed using some form of constraint ranking or weighting. 17

There are, of course, questions still to answer. One concerns the precise nature of morphology appealed to in the RM′ constraint. It is fairly clear from the extensive literature on mutation that it is morphological in nature. In fact, some have argued that it is no longer phonology at all. That said, a more precise characterisation of the difference between morphological processes that are subject to RM′ and phonological processes that are not would be a step forward.

A second question is how much under- or overrepresentation should occur in relevant cases. This paper assumes that a significant difference in distributions is what Input Optimisation predicts, but this establishes only a lower bound. The working hypothesis is that under- and overrepresentation are bounded by other modules of the grammar, and that the system will under- or overrepresent in conformity with Input Optimisation, up to the limits imposed outside the system.

For example, we've seen that constructions like i afal [i aval] are underrepresented compared to constructions like i gath [i gaːθ]. Crudely speaking, one can assume that this underrepresentation is bounded by the need to have vowel-initial words for things like apples (size of vocabulary and what phonological contrasts are available) and the need to talk about apples (what kinds of circumlocutions are available). These other aspects of the larger phonological and linguistic system are well beyond the scope of this paper, but are an obvious place to look in the future.

1 See, for example, Trubetzkoy (1939), Greenberg (1954, 1966, 1974), as well as Maddieson (1984), for a compendious sample of such generalisations and Berkley (1994a, b), Frisch (1996), Frisch et al. (2000), Coetzee & Pater (2011) for further discussion.

2 If we look at the phonetic details, things can get much more complicated. For example, in a language like English, where /d/ is often voiceless through most of its duration, is it still more marked? The example in the text proceeds on either of two assumptions. One possibility is the traditional one: [t] and [d] are opposed in voicing at some level, and [d] is the more marked member of the pair. The other possibility is closer to the phonetics. The opposition is between [th] and [t], and [t] (orthographic d) is the more marked member of the pair (Vaux & Samuels 2005).

3 The Brown corpus is a fairly old written corpus of approximately one million words. I use it here because it is familiar to many and publicly available, allowing readers to more easily confirm the claims made here themselves.

4 An appendix with details of the statistical methods used is available as online supplementary materials at This includes how χ2 and expected values are calculated.

5 Bolinger (1962) argues that clash is avoided in use in English. He doesn't show this statistically, but he was certainly the first to make the point.

7 41 forms where the stress is incorrect (σ́σ́) had to be set aside. Most were miscoded morphologically complex forms.

8 Marked silences and disfluencies were treated as sentence breaks.

9 Thanks to an anonymous reviewer for extremely helpful discussion of these issues.

10 It's not that names necessarily avoid starting on mutatable consonants, but that the distribution of mutatable consonants is different between names and non-names, with names showing fewer mutatable initial consonants and non-names showing more. The facts presented are consistent with the other interpretation as well, i.e. that non-names prefer mutatable consonants.

11 Available at

12 We cannot easily check the effect with personal names using the Siarad corpus, as, unlike in the CEG corpus, personal names are not indicated.

13 Note that this is not an endorsement of Lexicon Optimisation; we're simply using it as inspiration. See Nevins & Vaux (2007) for discussion of some possible shortcomings of Lexicon Optimisation.

14 I include the markedness constraint NC here for completeness; markedness constraints are not determinative in reverse tableaux.

15 This problem has been noted before (Ussishkin 2000, Wolf 2007).

16 This cannot be tested with the Siarad corpus, since, as already noted, that corpus is not tagged for part of speech.

17 The ranking need not be strict. The same logic will work with stochastic, harmonic, noisy harmonic or maxent weighting.


Anderson, Skye (2015). The distribution of phonological changes in Welsh plurals. Ms, University of Arizona.
Berkley, Deborah Milam (1994a). The OCP and gradient data. Studies in the Linguistic Sciences 24. 5972.
Berkley, Deborah Milam (1994b). Variability in Obligatory Contour Principle effects. CLS 30:2. 112.
Bermúdez-Otero, Ricardo (1998). Prosodic optimization: the Middle English length adjustment. English Language and Linguistics 2. 169197.
Boersma, Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 4358. Available as ROA-221 from the Rutgers Optimality Archive.
Bolinger, Dwight L. (1962). Binomials and pitch accent. Lingua 11. 3444.
Chomsky, Noam & Halle, Morris (1968). The sound pattern of English. New York: Harper & Row.
Coetzee, Andries W. (2008). Grammaticality and ungrammaticality in phonology. Lg 84. 218257.
Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn. Malden, Mass. & Oxford: Wiley-Blackwell. 401431.
Coetzee, Andries W. & Kawahara, Shigeto (2013). Frequency biases in phonological variation. NLLT 31. 4789.
Deuchar, Margaret, Davies, Peredur, Herring, Jon Russell, Parafita Couto, M. Carmen & Carter, Diana (2014). Building bilingual corpora. In Thomas, Enlli Môn & Mennen, Ineke (eds.) Advances in the study of bilingualism. Bristol: Multilingual Matters. 93110.
Ellis, N. C., O'Dochartaigh, C., Hicks, W., Morgan, M. & Laporte, N. (2001). Cronfa electroneg o Gymraeg (CEG): a 1 million word lexical database and frequency count for Welsh.
Frisch, Stefan A. (1996). Similarity and frequency in phonology. PhD dissertation, Northwestern University.
Frisch, Stefan A., Large, Nathan R. & Pisoni, David B. (2000). Perception of wordlikeness: effects of segment probability and length on the processing of nonwords. Journal of Memory and Language 42. 481496.
Golston, Chris (1998). Constraint-based metrics. NLLT 16. 719770.
Green, Anthony D. (2006). The independence of phonology and morphology: the Celtic mutations. Lingua 116. 19461985.
Greenberg, Joseph H. (1954). A quantitative approach to the morphological typology of language. In Spencer, Robert F. (ed.) Method and perspective in anthropology: papers in honor of Wilson D. Wallis. Minneapolis: University of Minnesota Press. 192220.
Greenberg, Joseph H. (1966). Language universals, with special reference to feature hierarchies. The Hague & Paris: Mouton.
Greenberg, Joseph H. (1974). Language typology: a historical and analytic overview. The Hague & Paris: Mouton.
Guy, Gregory R. (1991). Explanation in a variable phonology: an exponential model of morphological constraints. Language Variation and Change 3. 122.
Hammond, Michael (1984). Constraining metrical theory: a modular theory of rhythm and destressing. PhD dissertation, UCLA. Published 1988, New York: Garland.
Hammond, Michael (1999). Lexical frequency and rhythm. In Darnell, Michael, Moravcsik, Edith, Newmeyer, Frederick, Noonan, Michael & Wheatley, Kathleen (eds.) Functionalism and formalism in linguistics. Vol. 1: General papers. Amsterdam & Philadelphia: Benjamins. 329358.
Hammond, Michael (2003). Phonotactics and probabilistic ranking. In Carnie, Andrew, Harley, Heidi & Willie, MaryAnn (eds.) Formal approaches to function in grammar: in honor of Eloise Jelinek. Amsterdam & Philadelphia: Benjamins. 319332.
Hammond, Michael (2013). Input optimization in English. Journal of the Phonetic Society of Japan 17. 2637.
Hammond, Michael (2014). Phonological complexity and input optimization. Phonological Studies 17. 8594.
Hammond, Michael, Moravcsik, Edith & Wirth, Jessica (1988). Language typology and linguistic explanation. In Hammond, Michael, Moravcsik, Edith & Wirth, Jessica (eds.) Studies in syntactic typology. Amsterdam & Philadelphia: Benjamins. 122.
Hannahs, S. J. (2011). Celtic mutations. In van Oostendorp, Marc, Ewen, Colin J., Hume, Elizabeth & Rice, Keren (eds.) The Blackwell companion to phonology. Malden, Mass.: Wiley-Blackwell. 28072830.
Hannahs, S. J. (2013). The phonology of Welsh. Oxford: Oxford University Press.
Hayes, Bruce (1980). A metrical theory of stress rules. PhD dissertation, MIT. Published 1985, New York: Garland.
Hayes, Bruce (1984). The phonology of rhythm in English. LI 15. 3374.
Hayes, Bruce & Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic learning. LI 39. 379440.
Jakobson, Roman (1968). Child language, aphasia and phonological universals. The Hague: Mouton.
Kiparsky, Paul (1982). Lexical morphology and phonology. In Linguistic Society of Korea (ed.) Linguistics in the morning calm. Seoul: Hanshin. 391.
Kučera, Henry & Francis, W. Nelson (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Kurisu, Kazutaka (2001). The phonology of morpheme realization. PhD dissertation, University of California, Santa Cruz.
Liberman, Mark & Prince, Alan (1977). On stress and linguistic rhythm. LI 8. 249336.
McCarthy, John J. & Prince, Alan (1993). Prosodic morphology I: constraint interaction and satisfaction. Ms, University of Massachusetts, Amherst & Rutgers University.
Maddieson, Ian (1984). Patterns of sounds. Cambridge: Cambridge University Press.
Menn, Lise & MacWhinney, Brian (1984). The repeated morph constraint: toward an explanation. Lg 60. 519541.
Morris Jones, J. (1913). A Welsh grammar: historical and comparative. Oxford: Clarendon.
Nevins, Andrew & Vaux, Bert (2007). Underlying representations that do not minimize grammatical violations. In Blaho, Sylvia, Bye, Patrik & Krämer, Martin (eds.) Freedom of analysis? Berlin & New York: Mouton de Gruyter. 3561.
Pater, Joe (2009). Weighted constraints in generative linguistics. Cognitive Science 33. 9991035.
Pitt, Mark A., Dilley, Laura C., Johnson, Keith, Kieling, S., Raymond, William D., Hume, Elizabeth & Fosler-Lussier, E. (2007). Buckeye corpus of conversational speech. 2nd release. Columbus: Ohio State University.
Prince, Alan & Smolensky, Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.
Stemberger, Joseph Paul (1981). Morphological haplology. Lg 57. 791817.
Stewart, Thomas W. (2004). Mutation as morphology: bases, stems, and shapes in Scottish Gaelic. PhD dissertation, Ohio State University.
Toutanova, Kristina, Klein, Dan, Manning, Christopher D. & Singer, Yoram (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Human Language Technology Conference of the NAACL. 173–180.
Trubetzkoy, Nikolai S. (1939). Grundzüge der Phonologie. Göttingen: Vandenhoeck & Ruprecht.
Turton, Danielle (2012). The darkening of English /l/: a stochastic stratal OT analysis.
Ussishkin, Adam (2000). The emergence of fixed prosody. PhD dissertation, University of California, Santa Cruz.
Vaux, Bert & Samuels, Bridget (2005). Laryngeal markedness and aspiration. Phonology 22. 395436.
Wolf, Matthew (2007). For an autosegmental theory of mutation. In Bateman, Leah, O'Keefe, Michael, Reilly, Ehren & Werle, Adam (eds.) Papers in Optimality Theory III. Amherst: GLSA. 315404.
Zwicky, Arnold M. (1987). Suppressing the Zs. JL 23. 133148.