6 - Tocharian

Published online by Cambridge University Press:  15 September 2022

Thomas Olander
University of Copenhagen


The two Tocharian languages, Tocharian A and Tocharian B, are closely related and clearly form a branch within Indo-European. Therefore, the discussion of the evidence for the Tocharian branch focuses on the most important changes that have shaped and typologically changed the language. Many innovations of Tocharian, especially in the lexicon, are due to language contact. Some of these contacts took place before the break-up of Proto-Tocharian, while others took place at later stages. It is widely held that, after Anatolian, Tocharian was the second branch to split off the Indo-European proto-language, which may be termed the “Indo-Tocharian” hypothesis. A selection of arguments for Indo-Tocharian from phonology, morphology and lexicon are analysed and evaluated according to the criteria of identifiability, unidirectionality and salience. Although the Indo-Tocharian hypothesis remains attractive, it appears that progress in reconstruction mostly brings Tocharian closer to Core Indo-European than to Anatolian. Tocharian probably split off second, but much later than Anatolian and not long before the remaining speech community started to disintegrate.

The Indo-European Language Family
A Phylogenetic Perspective
, pp. 83 - 101
Publisher: Cambridge University Press
Print publication year: 2022

6.1 Introduction

The Tocharian languages A and B are attested in manuscripts from the northern Tarim Basin, present-day Northwest China. Tocharian B is attested from about the fifth to the tenth centuries of the Common Era. Originally from Kuča, it spread east to Yānqí and Turfan, probably in the late sixth and in the seventh century. In Tocharian B itself, the language is referred to as the language of kuśi ‘Kuča’. Tocharian A is attested a little later, from about the seventh to the tenth centuries. It is originally from Yānqí, spread with Tocharian B east to Turfan, but not west to Kuča, and is referred to as the language of ārśi ‘Yānqí’. Both languages are written in the Indian Brāhmī script, and the vast majority of the manuscripts are of Buddhist content.

Traces of a third Tocharian language have been claimed to be preserved in the Middle Indic Gāndhārī dialect of Niya in the southern Tarim Basin (Reference BurrowBurrow 1935). This hypothesis has not received wide support and must still be considered very uncertain (see further below in Section 6.3).Footnote 1

6.2 Evidence for the Tocharian Branch

The existence of the Tocharian branch of Indo-European is beyond any doubt. The two languages A and B are closely related and share numerous significant innovations, so it is unnecessary to give a full list here. Some of the more important, branch-defining developments are:

  • loss of the threefold Proto-Indo-European distinction between the conventionally termed voiceless, voiced and voiced aspirated stops, i.e. *, *ǵ, *ǵʰ merged into *k (on *d, see below);

  • several mergers and shifts in the vowel system, including loss of vowel length, merger of *i, *e, *u into *ə (the first two regressively palatalising), shifts of *o to *e and of *ā < *eh2 to *o, monophthongisation of *ei to *’i and of *eu to *’u, etc.;

  • rise of distinctive and morphological palatalisation, principally through the transformation of the contrast between *o : *ē into *e : *’e and * : *e into *ə : *’ə;

  • loss of word-final *-s, *-m, *-n, *-t (*-d), which has led to heavy restructuring of both the nominal and the verbal inflection;

  • rise of agglutinative case inflection in the noun, next to agglutinative number inflection in some noun classes;

  • almost complete loss of prefixing morphology;

  • rise of an intricate system of verbal derivation to form intransitives and transitives or causatives;

  • numerous significant innovations in the lexicon.

Even considering the late attestation of the Tocharian branch, the extent of structural change is surprisingly large, and it can be argued that this is partly due to a substrate effect. The loss of the distinction between the so-called voiceless, voiced and voiced aspirated stops, the rise of agglutinative case inflection, and the functions of these case suffixes, which include the perlative, denoting movement through, along or over something, point to Uralic influence. A pre-Proto-Tocharian phase of the vowel system can be compared more specifically with an early form of Samoyedic. Pronoun suffixes attached to the finite verb denoting the object may be compared with the objective inflection in Uralic (Reference PeyrotPeyrot 2019a with references; on the vowel system, see Reference WarriesWarries in press).

It is more difficult to assess the Iranian impact on Tocharian. There has been considerable Iranian influence on the lexicon (Reference IsebaertIsebaert 1980; Reference TremblayTremblay 2005), but only the oldest layer of borrowings from Old Iranian may possibly be added as a branch-defining feature of Tocharian. The reason is that any feature defining the whole branch should have been acquired before the break-up of unitary Tocharian into Tocharian A and B. This is clearly the case with the structural shift attributed to Uralic above. However, many borrowings from Iranian are to be dated after the break-up and therefore do not define the Tocharian branch as such. Examples of this include borrowings from Bactrian, such as Toch.B akālk and Toch.A ākāl ‘wish’ from Bactrian αγαλγο /aγalg/: the ā_ā vocalism of Tocharian A, instead of the ā_a vocalism regular in inherited vocabulary, shows that the word has entered the language later, and the Toch.B and Toch.A forms cannot be reconstructed to a common proto-form. Bactrian influence is therefore to be dated after the split of Proto-Tocharian. The case of borrowings from Old Iranian is different. An example is Toch.B perne, Toch.A paräṃ ‘glory’, which allows a Proto-Tocharian reconstruction *perne, borrowed from Old Iranian *farnah- (Av. xᵛarənah-).

Nevertheless, for the Old Iranian layer, the details are not fully clear either. Tocharian B would have preserved a word like *perne unchanged, and the amount of change in Tocharian A is limited: *e > a in the first syllable; apocope of *e in the final syllable; ä-epenthesis in the final cluster -rn. Since these changes in Tocharian A cannot be dated exactly, it cannot be excluded that *farnah- was borrowed into Tocharian B and A independently, at an early stage, before the relevant sound changes in Tocharian A occurred but after the break-up of Proto-Tocharian. A reason to consider this more complicated chronology are the sound changes *rn > rr and *ln > ll in both languages. Good examples of the former are not found in Tocharian A, but the latter is certain. Since old geminates are generally simplified in Tocharian A, the rise of new geminates from *rn and *ln must be dated after the general simplification of geminates. The preservation of rn in ‘glory’ thus suggests an early but post-Proto-Tocharian borrowing according to the following relative chronology:

  1. 1. break-up of Proto-Tocharian;

  2. 2. degemination in pre-Tocharian A;

  3. 3. assimilation of *rn, *ln to rr, ll (the same change occurred independently in pre-Tocharian B);

  4. 4. borrowing of *farnah- as *perne (the same borrowing occurred independently in pre-Tocharian B);

  5. 5. *e > a, apocope of final *e, and ä-epenthesis to produce Tocharian A paräṃ.

Another indication of this chronology is offered by Toch.B etswe ‘mule’, borrowed from Old Iranian *atswa- ‘horse’ (Av. aspa-). Although Toch.B mətstsa-, Toch.A nätswā- ‘starve’ shows that Proto-Tocharian *tsw has developed to tsts in pre-Tocharian B after the break-up of Proto-Tocharian, etswe has tsw unchanged, suggesting that the borrowing is post-Proto-Tocharian. Old Iranian borrowings can only be taken as a branch-defining feature if the preservation of the cluster tsw in Tocharian B, and of the cluster rn in both languages, receives an alternative explanation, notably a conditioning of the relevant assimilations, such as a difference in accent.

6.3 The Internal Structure of Tocharian

As Tocharian A cannot be derived from Tocharian B or vice versa, a common ancestor called Proto-Tocharian needs to be reconstructed. For instance, Toch.B yente ‘wind’ cannot have yielded Toch.A want ‘wind’, and the reverse is also impossible: a preform *ẃente is to be posited, with innovations in both languages leading to the attested forms. There is no need to discuss the internal subgrouping of Tocharian, since only one tree is possible. The dating of Proto-Tocharian, the only node in this tree, will be discussed below. Even though Burrow’s hypothesis of a third Tocharian language is too uncertain to be taken into account for inferences on the prehistory of Tocharian, it presents an illustrative case for the methodology of internal subgrouping.

The Gāndhārī words in the documents from Niya for which Reference BurrowBurrow (1935) suggests a Tocharian etymology are few, and among these only two are relevant here: kitsa’itsa, a title, and aṃklatsa, a type of camel. kitsa’itsa has a very Tocharian-looking structure and has been convincingly connected to Toch.B ktsaitse ‘old’, Toch.A ktsets ‘perfect’ by Burrow, who suggests ‘elder’ for the Gāndhārī title. Toch.B ktsaitse derives from PToch. *kətˢaitˢtˢe with degemination after a diphthong,Footnote 2 and Toch.A ktsets has undergone apocope of final -e and monophthongisation of *ai to e; both languages have syncopated the *ə in the first syllable. Niya kitsa’itsa could derive from Tocharian B as well as pre-Tocharian A or a third branch and is therefore useless for subgrouping. It could reflect an older form of the type *kətˢaitˢtˢe with i for ə and the regular Gāndhārī final -a for the regular Tocharian final ‑e. The geminate could be simplified or left unwritten. Equally, it could go back to a form of the type Toch.B ktsaitse, with i -epenthesis in the first syllable. Since Tocharian A is attested from the seventh century onwards, much later than Niya Gāndhārī, which is from the third–fourth centuries, it could also derive from an early form of Tocharian A in which monophthongisation of *ai to e had not yet taken place.

The key form for Burrow’s understanding of the internal subgrouping is aṃklatsa (Reference Burrow1935: 673). According to him, aṃklatsa denotes a relatively cheap camel, which may therefore have been untrained. He connects the word to Toch.B aknātsa, Toch.A āknats ‘fool’, which is formed with the negative prefix *en- from the verb *kna- ‘to know’: in both languages, the vowel of the prefix has been affected by a-umlaut, and its nasal has been lost before the cluster kn-. To explain the different cluster ṃkl in Niya Gāndhārī, he assumes that it goes back to an earlier form with *nkn that was dissimilated to nkl, written ṃkl. Since the first n of the cluster is lost in both Tocharian A and B, he concludes that the Tocharian variety he assumes in the Gāndhārī of Niya is of a different branch, and this is the reason why it is often termed “Tocharian C”.

Burrow’s Tocharian etymology of Niya Gāndhārī kitsa’itsa is attractive, but his explanation of aṃklatsa is not convincing in view of the semantic and formal problems. At any rate, this questionable etymology can never alone bear the weight of proving a third branch of Tocharian, the famous “Tocharian C”.Footnote 3 Rather, in the light of research by Niels Schoubben, who proposes new and convincing alternative explanations for some other items that Burrow explained as Tocharian (Reference SchoubbenSchoubben 2021), scepticism about Burrow’s hypothesis is definitely due.

No absolute date can be given for Proto-Tocharian, by definition the latest phase of unity before the break-up in pre-Tocharian A and pre-Tocharian B. The languages are closely related, but differences are considerable in the lexicon, and most scholars estimate Proto-Tocharian around 500 BCE: some take it to be a little bit earlier, between 1000 BCE and 500 BCE; others a little bit later, between 500 BCE and the beginning of the Common Era (see the useful overview of different estimates in Reference MalloryMallory 2015: 7–8).

It is commonly agreed that the advent of Buddhism was after the break-up, as such basic terms as dharma ‘law’ (Toch.B pelaikne, Toch.A märkampal) and karman ‘act, fate’ (Toch.B yāmor, Toch.A lyalypu) are different (Reference Lane, Birnbaum and PuhvelLane 1966). But since Buddhism arrived late in the region, perhaps in the first or second century CE, this gives only an unsurprising ante quem date.

Contacts with the Iranian languages Bactrian and Sogdian took place after the split, probably in the early first millennium CE. Contacts with Old Iranian are more interesting: since it can be debated whether they occurred before or after the break-up, they may have to be dated close to that break-up. In the scenario sketched above, they would have occurred soon after it.Footnote 4 However, the Old Iranian loanwords are themselves difficult to date in absolute terms. The archaic appearance of words such as Toch.B etswe ‘mule’ ⇐ OIrn. *atswa- ‘horse’ (Av. aspa-) or Toch.B waipecce ‘possessions’ ⇐ OIrn. *hwai-paθya- (Av. xᵛaēpaiθiia- ‘own’) suggests a date in the middle of the first millennium BCE or earlier, but a more precise dating is difficult. I have suggested that these loanwords may be associated with the presence of Andronovo related groups in Northern Xīnjiāng in the thirteenth–ninth centuries BCE (Reference PeyrotPeyrot 2018: 280), which would accordingly push the date of Proto-Tocharian towards the beginning of the first millennium BCE. The assumed contacts with Uralic, which may date to around 2500 BCE, in any case took place long before the split, in a pre-Proto-Tocharian phase.

Archaeological evidence on the Tocharians themselves is at present not clear enough (Reference MalloryMallory 2015: 29 and passim). It is uncertain whether the Cháwúhūgōu cultural group near Qarašähär (Reference Debaine-FrancfortDebaine-Francfort 1989: 183–9), whose different phases together cover almost the entire first millennium BCE, can be identified with early speakers of Tocharian A, or whether the Hālādūn cultural group of the early first millennium BCE in and near Kuča (Reference Debaine-FrancfortDebaine-Francfort 1988: 23) can be identified with early speakers of Tocharian B. Accordingly, archaeological evidence for the date of Proto-Tocharian or the place where it was spoken is presently indirect at best.

6.4 The Relationship of Tocharian to the Other Branches

It is now commonly held that Tocharian has no closer affinity to any other branch of Indo-European.Footnote 5 Proposals for closer affinity have been made but have found little acceptance and concern superficial similarities, such as the spread of the n-stems in the nominal inflection, which would be shared with Germanic (Reference AdamsAdams 1988: 5), or the endings in -r of the middle, suggesting a link with Italo-Celtic (e.g. Reference Lane and CardonaLane 1970: 78, who attributes the correspondence to post-Proto-Indo-European contact), and so on. References to and discussion of these and other suggestions can be found in Reference Hackstein, Meiser and HacksteinHackstein 2005 and Reference MalzahnMalzahn 2016: 281.

Not accepting any of the adduced old comparisons, Reference Hackstein, Meiser and HacksteinHackstein (2005) proposes instead several close matches between Tocharian and other branches in grammaticalisation processes. According to him, the observed grammaticalisation processes are independent and parallel instead of shared, and indicate post-Proto-Indo-European contact. The matches that he proposes are with Latin, Slavic, Gothic, Greek and Armenian. Although the cases discussed are interesting, the large number of languages in the comparison makes it unlikely that the parallelisms are due to early contact. In addition, it is open to debate whether the parallelisms, if correctly identified, are indeed so salient that they cannot have come about completely independently. For instance, the univerbation of interrogative and demonstrative in Toch.B kᵤse ‘who’ < *kʷi + so, in Alb. kush ‘who’ < *kʷis + so, and in OCS kъto ‘who’ with -to from PIE *tod (Reference Hackstein, Meiser and HacksteinHackstein 2005: 177) has not proceeded in exactly the same way; it probably compensates, at least in part, for the loss of inflection and word weight; and it appears to be a natural process. Toch.B and ṣpä ‘and’, which Hackstein derives from *h1eti and *h1eti-h1epi respectively, in fact represent one and the same etymon *ṣpə with simplification of ṣp to in classical and late Tocharian B (Reference PeyrotPeyrot 2008: 68) so that cannot be directly compared with Latin et or Gothic (pace Reference Hackstein, Meiser and HacksteinHackstein 2005: 176).

A different case is presented by matches with Anatolian, of which several have been proposed that appear to be fairly solid: see for instance Reference PinaultPinault 2006a: 93. These must be archaisms, not showing any closer affinity between Anatolian and Tocharian, and are potentially relevant to establish the position of Tocharian in the tree of Indo-European, discussed in the following section.

6.5 The Position of Tocharian

Tocharian is often claimed to have been the second branch to split off the Indo-European proto-language: after Anatolian, but before all other attested branches. This hypothesis may be called the “Indo-Tocharian” hypothesis, based on the model of Indo-Anatolian (Reference Peyrot, Kloekhorst and PronkPeyrot 2019b; see Figure 6.1). “Indo-Anatolian”, equivalent to “Indo-Hittite”, is used here in a technical sense for the highest node in the Indo-European tree, before Anatolian split off as the first branch, a scenario for which the evidence is steadily growing (cf. Reference Kloekhorst, Pronk, Kloekhorst and PronkKloekhorst & Pronk 2019).Footnote 6 Strikingly, the arguments that have been advanced in support of the “Indo-Tocharian” hypothesis vary considerably: many authors making the same claim do not accept each other’s evidence for their claim. The most comprehensive systematic review is that by Reference RingeRinge (1991), who finds hardly any evidence for the position of Tocharian in the family tree at all. Other relevant contributions include Reference Lane and CardonaLane 1970, Reference Schmidt, Robert, Alexander and WeitenbergSchmidt 1992, Reference Winter and HockWinter 1997, Reference PinaultPinault 2013 and Reference MalzahnMalzahn 2016.

Figure 6.1 The position of Tocharian

Below, a selection of arguments will be discussed. In general, it appears that aberrancies of Tocharian are due to innovation, and careful reconstruction tends to bring Tocharian closer to non-Anatolian Indo-European. The Indo-Tocharian hypothesis still seems attractive, but evidence is slim and the difference between Indo-Anatolian and Indo-Tocharian appears to be much larger than that between Indo-Tocharian and the other Indo-European languages. If Indo-Anatolian can be dated to the middle of the fifth millennium BCE, Indo-Tocharian must be much closer to the middle of the fourth millennium. As pointed out to me by Tijmen Pronk, the split-off of the Tocharian branch (Reference AnthonyAnthony 2007: 305, 307–11; Reference Anthony and RingeAnthony & Ringe 2015: 208, 211) may be associated with the apparent abandonment of the Caspian steppe in 3500–3400 BCE, probably due to abrupt aridification (Reference ShishlinaShishlina 2008: 220).

6.5.1 Methodology

In view of the many different arguments that have been proposed for the Indo-Tocharian hypothesis, a brief note on the methodology seems in order.

It is generally agreed that the assumption of an early Tocharian split-off must be based on shared innovations of the other non-Anatolian Indo-European languages. In particular, the branch that split off after Tocharian should have shared in such innovations. As the most likely candidate for the branch to have split off third appears to be Italo-Celtic, the supposed shared innovation should ideally be attested in this branch. Conversely, arguments for Indo-Anatolian should be based on shared innovations of non-Anatolian Indo-European, ideally attested also in Tocharian (Reference Peyrot, Kloekhorst and PronkPeyrot 2019b).

Though clear in theory, in practice finding and defining shared innovations is difficult. There appear to be the following requirements to shared innovations useful for phylogenetic subgrouping:

  • identifiability: the linguistic element adduced as a shared innovation in the lower node should be clearly identifiable in the higher as well as in the lower node;

  • unidirectionality: the observed difference with regard to the selected linguistic element should be interpretable as a unidirectional change;

  • salience: the observed change should be so salient that it is unlikely to have occurred independently in the supposed lower-node branches, in which case it would be a parallel, not a shared innovation.

The requirement of unidirectionality is widely accepted, and discussion tends to focus on the question as to whether a given difference can be interpreted as a unidirectional change, rather than the need of this requirement as such. A case in point is semantic change: phylogenetic arguments based on semantic change are often contested on the grounds that a given semantic difference is not necessarily due to unidirectional change.

The requirement of identifiability, often implicit, may be helpful in discussions about debated phylogenetic arguments based on the loss or addition of features, or on lexical replacement. Arguments based on loss or addition are notoriously difficult, as for instance with the comparative and superlative suffixes, which are unattested in Anatolian and Tocharian: have they been lost in both branches, or were they added after the Tocharian split-off? Such arguments cannot be applied if the supposedly added feature cannot be identified with any prestage leading to it or if the lost feature has left no trace at all. Arguments based on lexical replacement are weak because the identifiable element would be the meaning, expressed with different etyma in two branches. Meaning is difficult to use as an identifiable element, because several etyma may have similar, overlapping or even identical meanings, and it is therefore difficult to prove that a certain meaning came to be expressed with a different etymon.

The requirement of salience seems so obvious that no further explanation is needed.

6.5.2 Phonology

For our present purposes, phonological evidence appears to be of little relevance in view of the extensive changes in the Tocharian sound system, which are probably due to a Uralic substrate (see Section 6.2). In particular, evidence for the phonetic realisation of the stops in the proto-language has been obscured by this substrate effect. Thus, there is little evidence to establish the position of Tocharian with relation to Kloekhorst’s claim that Anatolian preserves an older system of stop distinctions (Reference Kloekhorst2016), with classical PIE *t, *d, * from Proto-Indo-Anatolian *, *ˀt, *t.

For Tocharian, the developments of *d and * are notable. PIE *d is by default represented with ts and is otherwise often lost, at least before *, * and *r, and so differs from *t and *, whose default outcome is Tocharian t.Footnote 7 Thus, even though the exact phonetics remain difficult to establish, *t and * were apparently closer to each other than either of them were to *d.Footnote 8 At the same time, * is lost after *m, for instance in *ǵombʰo- > Toch.B keme ‘tooth’, while *p stays, for instance in *temp- (Lith. tempiù ‘stretch’) > Toch.B cəmp- ‘be able’.Footnote 9 This suggests that * was weaker than *p: it may have been voiced, fricative or both. It is tempting to compare the typologically common loss of voiced stops after nasals, as in English lamb /læm/, and posit the value [b] for *, but this is certainly not the only option. Combining the evidence from dentals and labials, it appears that the stop system inherited by Tocharian had strong stops for the conventional voiceless stops like *t, weak stops for the conventional voiced aspirated stops like *, and a series that was different from both for the conventional voiced stops like *d.Footnote 10 Although Tocharian offers no direct evidence for the reconstruction of glottalic stops in Proto-Indo-European, the fact that *d has a different reflex from *t and * is neatly compatible with it, since under Kortlandt’s glottalic theory (e.g. Reference Kortlandt1985; Reference Kortlandt2018a) *d [ˀd] on the one hand is set apart from *t and * on the other.

Nevertheless, the value for the phylogenetic position of Tocharian remains undecided. Since there is strong evidence for *d = *ˀd in classical Indo-European, this feature cannot be used. Further, the position of Tocharian cannot be determined with regard to Kloekhorst’s claim that classical PIE *t (perhaps phonetically [t]) < Proto-Indo-Anatolian * and classical PIE * (perhaps phonetically [d]) < Proto-Indo-Anatolian *t, since both phonetic stages are compatible with *t being stronger and * being weaker.Footnote 11

It has been argued that Tocharian shows consonantal reflexes of PIE *H as k (e.g. Reference Winter and WinterWinter 1965: 206–10; Reference Schmidt and BammesbergerSchmidt 1988; Reference Kortlandt and BeekKortlandt 2018b). Winter adduces Tocharian A “intrusive k” as a consonantal reflex of *HH, e.g. lwākis to lwā ‘animals’ or puklākā to puklā ‘years’. However, k must be secondary in such examples because it effectively prevents the problematic vowel contractions in the morphologically expected forms **lwes < *lwā.is (next to attested lwes!) and **puklā < *puklā.ā. Schmidt (cf. also Reference HartmannHartmann 2001) has argued that the k in roots ending in -tk goes back to *h2, but Melchert’s earlier derivation of -tk- from -T-sk- is definitely to be preferred (Reference Melchert1977; cf. also Reference PinaultPinault 2006b). Kortlandt’s derivation of Tocharian B taka- ‘be’ from *steh2-t with -k- from *h2 is in itself attractive, but since the “k-aorist” is also attested in e.g. Gr. ἔθηκα and Lat. fēcī, this reflex cannot be used to determine the phylogenetic position of Tocharian, even if the evidence as such nicely fits Kloekhorst’s reconstruction of *h2 and *h3 as uvular stops for Proto-Indo-Anatolian (Reference Kloekhorst2018; cf. also Reference Kortlandt, Blokland and HasselblattKortlandt 2002: 218).

Like other Indo-European languages, Tocharian shows reflexes of metathesis of *Hi to *iH and *Hu to *uH. For instance, metathesis of *Hu to *uH is attested by such forms as Toch.B puwar ‘fire’ < *puh2rFootnote 12 (as in Greek πῦρ) from earlier *peh2-ur (as in Hitt. paḫḫur) and Toch.B ləw(a)- ‘rub’ (prt.3sg.-3sg.obj. lyawā-ne ‘he rubbed him’) < *leuh3- from earlier *leh3u- (as in Hitt. lāḫu-i ‘pour’). Even though unmetathesised forms are also found, for instance Toch.B kaw- ‘kill’ < *keh2u-, the existence of metathesised forms in Tocharian clearly shows that this sound change is to be dated before Tocharian split off. However, even though Hittite often shows unmetathesised forms next to metathesised forms elsewhere (Reference Kloekhorst, Pronk, Kloekhorst and PronkKloekhorst & Pronk 2019: 5), the metathesis must have already occurred before Proto-Indo-Anatolian on the evidence of forms such as Hitt. šuḫḫa- ‘pour, sprinkle’ < *suh2- next to išḫu(wa)- < *seh2u- and lu-u- ‘pour’ < *luh3- next to lāḫu- < *leh3u- (Reference Melchert, Jamison, Craig Melchert and VineMelchert 2011: 129, 131). At this point, therefore, the mere attestation of laryngeal metathesis cannot be used for inner Indo-European phylogeny.

However, another Indo-European metathesis may be used: that of word-final *-ur to *-ru (Reference LubotskyLubotsky 1994: 99–100). This sound change seems to have occurred only after Proto-Indo-Anatolian. Strong evidence for it in Tocharian has been discovered by Reference Del TombaDel Tomba (2021), who shows that Toch.B plurals in -wa to nouns in -r, such as tarkär ‘cloud’, pl. tärkarwa, presuppose metathesis of *-ur to *-ru in the singular, on which the plural ‑r-wa < *-ru-h2 was built. Although this sound change may be used for the phylogeny of Indo-European, it clearly groups Tocharian together with the non-Anatolian languages.

6.5.3 Morphology

Morphology is the domain that is often ascribed the highest potential to yield evidence for the phylogenetic position of Tocharian. Indeed, morphology meets two essential needs: it is constantly in the process of change, and, at the same time, shifts in function, though commonplace, are subject to limitations. Unfortunately, Tocharian morphology is heavily reorganised and its prehistory is often very obscure. Even worse is the fact that the reconstruction of Proto-Indo-European in exactly the relevant points is difficult and often disputed.

Without a doubt, the most prominent argument for phylogeny based on morphology that has been advanced comes from the Tocharian s-preterite. In the active of the Tocharian s-preterite, an element s is only found in the 3sg.: 1sg. prek-uwa ‘asked’, 2sg. prek-asta, 3sg. prek-sa, 1pl. prek-am, 2pl. prek-as*, 3pl. prek-ar. This is reminiscent of the Hittite ḫi-preterite, which likewise has only in the 3sg.: 3sg. ākkiš ‘died’, 3pl. aker (Reference PedersenPedersen 1941: 146). There are two schools of thought to explain this correspondence. The first, most prominently voiced by Jasanoff (e.g. Reference Jasanoff2003: 204–5),Footnote 13 takes the restriction of the -s- as an archaism of Anatolian and Tocharian, while the rise of the classical s-aorist through generalisation of the -s- from the 3sg. throughout the whole paradigm is a common innovation of the other Indo-European branches. According to the second one, the in Hittite is secondary, probably somehow from the s-aorist, while in Tocharian the s-preterite forms without -s- lost it due to the effects of sound law and analogy (Reference RingeRinge 1990; Reference Kortlandt and SchlerathKortlandt 1994; Reference PeyrotPeyrot 2013: 503–7). The matter cannot be treated here in detail. Suffice to say that the assumption of loss of -s- accounts best for the inflection of the Tocharian preterite and its patternings with the subjunctive. At any rate, this famous case very clearly shows how different views on the reconstruction of Proto-Indo-European logically lead to different evaluations of arguments for phylogeny.

Another phylogenetic argument is based on the middle endings in -r (e.g. Reference Ringe, Warnow and TaylorRinge, Warnow & Taylor 2002; Reference RingeRinge 1991: 98–9). It is widely held that the shorter middle endings 3sg. *-to and 3pl. *-nto were secondary endings in Proto-Indo-European, while the corresponding primary endings were originally 3sg. *-to-r, 3pl. *-nto-r, which were later replaced by 3sg. *-to-i, 3pl. *‑nto‑i, marked with the productive primary marker *-i as found in the active endings. This would not be a valid argument for Indo-Tocharian, since the r-endings are also found in Italo-Celtic and Phrygian, but it would group Tocharian with the older branches.

However, a number of problems with this argument need to be noted:

  • It is questionable as to whether the contrast between Toch.B pres. 3sg. -tär, 3pl. -ntär and pret. 3sg. -te, 3pl. -nte continues an original primary–secondary contrast, because the Tocharian preterite active endings do not continue the secondary endings of Proto-Indo-European. In the copula 3sg. ste, 3pl. skente, the endings -te, -nte are even used as present endings. Reference HacksteinHackstein (1995: 273–5) explains these forms as original resultatives, i.e. “is” < “has become”, and notes that presentic readings of preterites are found elsewhere. However, it remains problematic as to why no shade of the past meaning has been preserved in 3sg. ste, 3pl. skente, and why the corresponding suffixed forms, such as 3sg.-1sg.obj. star-ñ, have present endings. This distribution is difficult to explain from an original difference in tense.

  • The reconstruction of the primary middle endings 3sg. *-to-r, 3pl. *-nto-r is problematic itself. Indeed, Lat. -tur, -ntur point to *-tor, *-ntor. However, as Reference WeissWeiss (2009: 413) notes, Osc. 3sg. -ter, 3pl. -nter point to *-tro, *-ntro, and Umb. primary -ter, -nter vs. secondary -tur, ‑ntur suggests Proto-Italic primary *-tro, *-ntro vs. secondary *-tor, *-ntor. Likewise, the Old Irish deponent endings 3sg. -thir, 3pl. -tir point to *-tr-, *‑ntr‑, probably *-tro, *-ntro. Finally, Toch. -tär, -ntär cannot be derived from *-tor, *-ntor directly (cf. also Reference Pinault and I. KimPinault 2010b). Reference RingeRinge (1996: 86) discusses the change of *-or to Toch. *‑ər, but the 3rd person middle endings are his only evidence, against counterexamples such as Toch.B malkwer ‘milk’, with suffix -wer < *-uor as in the verbal abstract, e.g. śeśuwer ‘eating’. A further counterexample seems to be yerter ‘felloe’, which on the evidence of the unpalatalised -t- must reflect *-tor.Footnote 14

  • The assumed replacement of well-marked middle paradigms ending in -r with the active marker -i is difficult to understand. What would be the motivation to do so? If endings are clearly marked to be primary, there seems no need to replace them. The greatest difficulty here is not the addition of the primary active marker -i – such additions are indeed found frequently in e.g. the perfect endings, such as OCS vědě, Lat. vīdī, or Toch.A kärse ‘I knew’ < *kərsa-a-i – but the fact that the transparent middle ending *-r should have been deleted.

In view of these problems, it is tempting to follow Kortlandt’s reconstruction (Reference Kortlandt1981) of the middle endings as *-to, *-nto only, without contrast between primary and secondary endings.Footnote 15 Apparently such contrasts were created independently in the different branches. In any case, the problematic specifics of the reconstruction of the middle endings make them difficult to use for phylogeny.

Another argument advanced by Reference Ringe, Warnow and TaylorRinge, Warnow & Taylor (2002: 117) is the thematic optative in *-o-ih1-, attested in Indo-Iranian, Greek, Balto-Slavic and Germanic, but not in Tocharian. Indeed, this may be a later innovation within Indo-European not shared by Tocharian. In Tocharian, there is only one variant of the optative suffix, -’i- (i with preceding palatalisation), to be derived from *-ih1-.Footnote 16 However, “present optatives”, synchronically imperfects, are unattested in Tocharian A, and they must consequently have been regularised secondarily in Tocharian B (Reference Peyrot, Hackstein and KimPeyrot 2012b). Therefore, it is difficult to prove that e.g. Toch.B pari* ‘he took’ goes back directly to *bʰer-ih1-t (for *bʰer-o-ih1-t elsewhere). In any case, since the thematic optative is not attested in Italo-Celtic, it cannot be used to show that Tocharian split off before that branch.

It has been argued that the combination of the Tocharian present participle in -mane with both active and middle finite inflection is an archaism: the verbal adjective *-mh1no- would originally have been indifferent for voice, very much like the *-nt-participle in Anatolian (Reference Kloekhorst, Pronk, Kloekhorst and PronkKloekhorst & Pronk 2019: 3), and became specialised only later, after Tocharian split off, as the middle counterpart of the active *-nt-participle (Reference Pinault, Sukač and ŠefčíkPinault 2012: 229; Peyrot 2017: 339–40). However, I now think that this argument has to be abandoned in the light of a study by Friis (Reference Friis2021), who shows that traces of specifically middle use are preserved, which suggests that active use of -mane in Tocharian is secondary.Footnote 17

A case from word formation in the grammatical domain is the interrogative stem in *m- found in Anatolian and Tocharian (Reference Hackstein and Jones-BleyHackstein 2004: 280–3; Reference Pinault, Choi-Jonin, Duval and SoutetPinault 2010a: 359; Reference PeyrotPeyrot 2019a: 195–9). A weak point of this argument is that the innovation of the other Indo-European languages would consist only in loss of the m-interrogative, while a strong point is the central position of this stem, paired only with *kʷi- (*kʷe-, *kʷo-), in the linguistic system. Thus, while the identifiability of this feature is low, its salience is nevertheless high.

6.5.4 Lexicon

Lexical evidence has been variously evaluated. Important papers adducing lexical evidence in support of an early split off of Tocharian are Reference Schmidt, Robert, Alexander and WeitenbergSchmidt 1992 and Reference Winter and HockWinter 1997. This evidence, and the method as a whole, was criticised by Reference Hackstein, Meiser and HacksteinHackstein (2005: 172) and Reference MalzahnMalzahn (2016) amongst others. For lexical arguments, a distinction should be made between lexical replacements and semantic change.

Arguments based on lexical replacement are especially difficult because the identifiability requirement is not easily satisfied: it is hard to prove that two words did not carry the same or a similar meaning. An example of such an argument is Anatolian (i.e. Luvian) and Tocharian (i.e. Toch.A) *uel(H)- ‘die’ vs. *mer- elsewhere (Reference Ringe, Warnow and TaylorRinge, Warnow & Taylor 2002: 99).Footnote 18 Although *mer- indeed acquired the meaning ‘die’ from ‘disappear’ after Indo-Anatolian (Reference Kloekhorst, Pronk, Kloekhorst and PronkKloekhorst & Pronk 2019: 3), and thus became a new word for the meaning ‘die’, the Luvian and Tocharian A words cannot be shown to represent the original word for ‘die’, let alone that it was ousted by the new *mer- (see Reference MalzahnMalzahn 2016: 285–6).

Another example is *h1egʷʰ- ‘drink’, well attested in Tocharian and Anatolian, as against *peh3- elsewhere (Reference Ringe, Warnow and TaylorRinge, Warnow & Taylor 2002: 99). This may indeed be a case of lexical replacement, i.e. the meaning ‘drink’ came to be expressed by a different word. However, the details are complicated: Hitt. pāš-i ‘swallow’ shows that *peh3- needs to be reconstructed for Proto-Indo-Anatolian, with possibly only a slightly different meaning; and Lat. ēbrius ‘drunk’ and Gr. νήφω ‘be sober’ show that *h1egʷʰ- was preserved after Tocharian split off, possibly with a shift to ‘be drunk’ (Reference Peyrot, Kloekhorst and PronkPeyrot 2019b). Thus, the argument for lexical replacement remains fragile, while the best phylogenetic evidence is formed by the possible semantic developments of ‘drink’ to ‘be drunk’ for *h1egʷʰ-, and ‘swallow’ to ‘drink’ for *peh3-. The attestation of the meaning ‘be drunk’ in Latin is favourable for the Indo-Tocharian hypothesis, because it suggests that this semantic change occurred after Tocharian split off, but before Italo-Celtic split off.

As a lexical argument based on semantics, Reference Ringe, Warnow and TaylorRinge, Warnow & Taylor (2002: 99) adduce *meǵh2, of which the Anatolian (e.g. Hitt. mekk-, mekki-) and Tocharian (e.g. Toch.B māka) reflexes mean ‘much, many’, as against ‘great’ elsewhere. The distribution is especially neat in this case, since the etymon is also attested in Italo-Celtic (OIr. maige, Lat. magnus, etc.) and Germanic (Goth. mikils). Here, the main problem is the requirement of unidirectionality: the meanings are contingent and a change from ‘great’ to ‘much’ is by no means unlikely.


1 I will not discuss in detail a posthumously published proposal by Schmidt (2018: 161–271) to read previously undeciphered manuscript fragments in Formal Kharoṣṭhī as a Tocharian variety from Lóulán. His tentative decipherment is not convincing. Instead, these fragments are probably written in an Iranian language related to Khotanese and Tumšuqese (Reference Dragoni, Schoubben and PeyrotDragoni, Schoubben & Peyrot 2020).

2 See Reference PeyrotPeyrot 2008: 45 on Toch.B -auñe < *euññe with the same degemination. Adams cites the word as ktsaitstse (Reference Adams2013: 263), but this form is not attested.

3 Even in the unlikely event that the etymology should be correct nevertheless, it does not necessarily prove the existence of a third branch of Tocharian. Rather than being a shared innovation of Tocharian A and B, the change *nkn > kn may be a parallel development, since there are cases where the nasal is lost in Tocharian A but preserved in Tocharian B (Reference HilmarssonHilmarsson 1991: 193–8).

4 It is tempting to consider the possibility that the apparently impressive technological advances brought by the Iranians speaking this Old Iranian language were the impetus for the split of Proto-Tocharian. At present no evidence for or against this scenario seems to be available.

5 The prolonged contact with Iranian and the shorter but dramatic impact of Indic are obviously to be discarded as secondary.

6 Reference Jasanoff, Klein, Joseph and FritzJasanoff (2017: 233–4) explicitly subscribes to this scenario but rejects the term “Indo-Hittite” because it “acquired tendentious overtones” (p. 233).

7 Also, the palatalised reflex of *d is ś, while that of * and *t is c.

8 Possibly, this distribution also holds for the assibilated variant -ṣ < *-ti, *-dʰi (Reference Jasanoff and WatkinsJasanoff 1987), although good evidence for the development of *-di is thus far lacking.

9 With original *mt, compare also *(d)ḱmtóm ‘100’ > Toch.B kante. Parallel cases with *, *ǵʰ, * and *gʷʰ are not readily available. Ringe discusses the possibility that the Toch.B subj. stem of lət- ‘go out’ as in the inf. lantsi shows /lən-/ < *h1lu‹n›dʰ- (1996: 43). However, he notes that forms with a geminate nn like 1sg. lannu ‘I will go out’ rather suggest an original *ləntn-. Indeed, all forms with a nasal in Toch.B can probably be derived from lənn- < *ləntn-, which arose secondarily through suffixation with *-nəsk- in the present (Reference PeyrotPeyrot 2013: 446). Toch.B laṅkᵤtse ‘light’ < *h1leng(ʷ)ʰ-u- shows that *gʷʰ was not lost after *n. It may be supposed that *ǵʰ and * were not lost after *n either. The reason for this exception could be that there was no corresponding velar nasal phoneme and the velar stop had to remain in order to keep the velar nasal allophone.

10 A thorough discussion of these developments can be found in Reference RingeRinge 1996: 39–66 and Reference WinterWinter 1962. In both accounts, a complicating factor is the Tocharian version of Grassmann’s Law, exemplified by e.g. Toch.B tsik- ‘form’ < *dʰeiǵʰ- and tsǝk- ‘burn’ < *dʰegʷʰ-, allegedly with ts < *d after * had been deaspirated to *d before the following *ǵʰ and *gʷʰ, respectively. The evidence for Grassmann’s Law in Tocharian is circumstantial and probably open to an alternative explanation. It is not taken into account here in view of the solid counterexample of Toch.B tapre ‘high’ < *dʰubʰro-, to be reconstructed with * instead of *b after Reference Kroonen, Pronk and DerksenKroonen (2011: 253, 255).

11 It is possible that Tocharian inherited a stop system in which distinctive voice had not yet developed, as argued by Kortlandt (e.g. Reference Kortlandt1985: 197; Reference Kortlandt, Bichlmeier, Šefčík and Sukač2020: 269), but in my view this is difficult to prove.

12 The Tocharian word for ‘fire’ is variously reconstructed. Hackstein, for instance, reconstructs *ph2u̯ōr (Reference Hackstein, Klein, Joseph and Fritz2017: 1314). It is, however, questionable whether *h2 would be lost in this context, and whether the reconstruction of a collective ending *-ōr for this etymon is warranted. A derivation of Toch.B puwar from *puh2r is the most straightforward. Reference Winter and WinterWinter (1965: 192) reconstructs the Tocharian A equivalent por as *paur from unmetathesised *peh2-ur. This is phonologically possible but most difficult morphologically, since it is not clear what the distribution of these variants in the Proto-Tocharian paradigm might have been. It is therefore preferable to assume a development *wa > o similar to *we > o in koṃ, of ku ‘dog’, < *kwen and *iye > e in karemāṃ ‘laughing’ < *keriyemane (Reference HilmarssonHilmarsson 1989: 135; Reference Hackstein, Klein, Joseph and FritzHackstein 2017: 1314; Reference PeyrotPeyrot 2012a: 210).

14 A possible alternative reconstruction would be *-ewer with contraction of *ewe to e.

15 The evidence of Anatolian seems compatible with an original *-to, *-nto without contrast between primary and secondary endings: synchronically, they are attested in Hittite as pres.3sg. -tta, 3pl. -anta. However, a derivation from *-tor, *-ntor neatly explains the rise of the present particle -ri from resegmentation after the loss of -r after *-ó- (Reference YoshidaYoshida 1990). The introduction of the particle -ti, to mark the preterite endings, i.e. 3sg. -ttati, 3pl. -antati, would be motivated in both scenarios.

16 The full grade variant **-ye- < *-ieh1- may have been ousted by the zero grade variant through paradigmatic levelling, but it is also possible that the zero grade variant was generalised from s-aorist optatives with *-ih1- throughout (if the synchronic optatives of root subjunctives of class 1, such as Toch.B parśi ‘may he ask’, are to be derived from s-aorist optatives, i.e. in this case *préḱ‑s‑ih1‑t).

17 Thus, even though I cannot agree with the arguments adduced by Reference Fellner, Grestenberger and RiekenFellner & Grestenberger (2018), I do now concur with their main claim.

18 Their argument about *ai- ‘give’ cannot be upheld with the reconstruction of Hitt. pāi ‘gives’ as *h1p‑oi‑e by Reference KloekhorstKloekhorst (2006).


Figure 0

Figure 6.1 The position of Tocharian

