Hostname: page-component-848d4c4894-cjp7w Total loading time: 0 Render date: 2024-06-16T00:57:35.592Z Has data issue: false hasContentIssue false

Mobile-assisted language learning: A Duolingo case study

Published online by Cambridge University Press:  28 May 2019

Shawn Loewen
1Michigan State University, USA (
Dustin Crowther
2Oklahoma State University, USA (
Daniel R. Isbell
3Michigan State University, USA (
Kathy Minhye Kim
4Michigan State University, USA (
Jeffrey Maloney
5Northeastern State University, USA (
Zachary F. Miller
6United States Military Academyat West Point, USA (
Hima Rawal
7Michigan State University, USA (
Rights & Permissions [Opens in a new window]


The growing availability of mobile technologies has contributed to an increase in mobile-assisted language learning in which learners can autonomously study a second language (L2) anytime or anywhere (e.g. Kukulska-Hulme, Lee & Norris, 2017; Reinders & Benson, 2017). Research investigating the effectiveness of such study for L2 learning, however, has been limited, especially regarding large-scale commercial L2 learning apps, such as Duolingo. Although one commissioned research study found favorable language learning outcomes (Vesselinov & Grego, 2012), limited independent research has reported issues related to learner persistence, motivation, and program efficacy (Lord, 2015; Nielson, 2011). The current study investigates the semester-long learning experiences and results of nine participants learning Turkish on Duolingo. The participants showed improvement on L2 measures at the end of the study, and results indicate a positive, moderate correlation between the amount of time spent on Duolingo and learning gains. In terms of perceptions of their experiences, the participants generally viewed Duolingo’s flexibility and gamification aspects positively; however, variability in motivation to study and frustration with instructional materials were also expressed.

Regular papers
© European Association for Computer Assisted Language Learning 2019 

1. Introduction

The growing ubiquity of mobile technologies, such as smartphones and tablets, has affected the way people study and learn a second language (L2) (e.g. Kukulska-Hulme et al., Reference Kukulska-Hulme, Lee, Norris, Chapelle and Sauro2017). In many cases, mobile technology extends learning beyond the classroom, and learners are able to make autonomous decisions about where, when, and how to study an L2 (Reinders & Benson, Reference Reinders and Benson2017). Indeed, the popularity of commercial online language learning programs attests to the interest that exists in using technology for autonomous language study. For example, in 2018 the free language learning application (app) Duolingo claimed to have 200 million active users (Smith, Reference Smith2018), whereas subscription-based Babbel cites one million ( Consequently, it is important to evaluate the effectiveness of such technology for L2 learning. As Heift and Chapelle (Reference Heift, Chapelle, Gass and Mackey2012) state, “The need exists to better understand the new conditions for second language acquisition (SLA) brought about by the real language-related capabilities of technologies that many learners have access to on a daily basis” (p. 565). More recently, Plonsky and Zeigler (Reference Plonsky and Ziegler2016) asserted that research needs to be concerned with “how the affordances of technology might best be exploited to provide learners with optimal language learning opportunities” (p. 17). In response to these challenges, the current study investigates the effectiveness of one specific language learning program, namely Duolingo, for L2 learning.

1.1 Mobile-assisted language learning

Growing with the advent of handheld mobile technologies is the recognition of such devices as useful tools for learning at any time or in any place. Over the past 15 years, the use of mobile technology has significantly increased, with mobile-internet devices exceeding the number of traditional desktop and laptop computers (Pegrum, Reference Pegrum2014). This increase has created an interest in mobile learning (m-learning); that is, using mobile technology (e.g. smartphones, tablets) for educational purposes such as teaching and learning (Duman, Orhon & Gedik, Reference Duman, Orhon and Gedik2015; Godwin-Jones, Reference Godwin-Jones2011; Golonka, Bowles, Frank, Richardson & Freynik, Reference Golonka, Bowles, Frank, Richardson and Freynik2014). One specific area of significant growth is mobile-assisted language learning (MALL) in which learners use mobile technology to engage in language study (Burston, Reference Burston2015; Duman et al., Reference Duman, Orhon and Gedik2015; Shadiev, Hwang & Huang, Reference Shadiev, Hwang and Huang2017).

Definitions of MALL vary somewhat; however, key components consistently include (a) flexibility in time and location of study; (b) continuity of study on different devices, such as mobile phones, tablets, and laptop/desktop computers; (c) easy accessibility of information; and (d) adaptability to personal study habits (Duman et al., Reference Duman, Orhon and Gedik2015; Kukulska-Hulme, Reference Kukulska-Hulme and Chapelle2012; Pegrum, Reference Pegrum2014; Petersen & Sachs, Reference Petersen, Sachs, Leow, Cerezo and Baralt2016; Reinders & Pegrum, Reference Reinders, Pegrum and Tomlinson2015). Consequently, learners can study anytime and anywhere, and their study materials are available across devices.

In addition to discussing portable devices, Reinders and Pegrum (Reference Reinders, Pegrum and Tomlinson2015) identify mobile materials, which refer to web services and apps that include built-in language learning materials and activities. Despite perceived limitations, including a tendency to rely on a behaviorist, teacher-centered approach towards language instruction (Reinders & Pegrum, Reference Reinders, Pegrum and Tomlinson2015), such apps have proven to be quite popular for autonomous language learning, with learners often accessing MALL material outside of or separate from classroom study (Burston Reference Burston2014b; Levy & Stockwell, Reference Levy and Stockwell2006; Rosell-Aguilar, Reference Rosell-Aguilar2018). As Kukulska-Hulme (Reference Kukulska-Hulme2009) argues, “To a certain extent, by dint of their ubiquity, mobile devices are already influencing how people learn; on the other hand, educators need to do more than just watch it happen” (p. 158).

1.2 Effectiveness of MALL

Research into the effectiveness of MALL for L2 development has been somewhat limited, due in part to an absence of objective, quantifiable measures of learning outcomes in MALL studies (Burston, Reference Burston2015; Shadiev et al., Reference Shadiev, Hwang and Huang2017). When appropriate measures of learning are present, though, MALL has been shown to provide learning advantages for reading, listening, and speaking (see Burston, Reference Burston2015, for an overview of existing research).

Further, the effectiveness of commercial online language learning programs is unclear. On the one hand, research commissioned by commercially available programs, such as Rosetta Stone (Vesselinov, Reference Vesselinov2009), Duolingo (Vesselinov & Grego, Reference Vesselinov and Grego2012), and Babbel (Vesselinov & Grego, Reference Vesselinov and Grego2016), has found favorable language learning outcomes for users, leading to claims that these programs offer equal or greater effectiveness than face-to-face foreign language courses. However, Van Deusen-Scholl (Reference Van Deusen-Scholl2015) countered that “despite strong claims about learner success, very little research is as yet available for commercial products, and outcomes are questionable” (p. 399). Indeed, the limited independent research that exists has found varied results. For example, Lord (Reference Lord2015) reported no performance differences on standardized test scores between learners participating in an in-person first semester Spanish course and beginner users of Rosetta Stone (45 hours). However, Lord noted differences between the groups during oral interview tasks, with Rosetta Stone users more frequently relying on English to resolve communicative difficulties. In addition, Nielson (Reference Nielson2011) found that United States government employees using Rosetta Stone and Auralog’s TELL ME MORE demonstrated limited persistence, subsequently making it impossible to gauge proficiency gains due to a lack of data. To a lesser degree, commercially commissioned research has also reported issues with user persistence (Vesselinov, Reference Vesselinov2009; Vesselinov & Grego, Reference Vesselinov and Grego2012).

1.3 Duolingo

The current study investigates one specific language learning app, Duolingo, and explores its effectiveness for L2 learning. Duolingo is a self-described free, science-based language education platform, which was created by Luis von Ahn and Severin Hacker in 2011 (Robertson, Reference Robertson2011). Although Duolingo is accessible to any web-enabled device via the Internet at, its biggest affordance is worldwide mobile access. The Duolingo app can be installed on mobile devices that use iOS, Android, and Windows operating systems. Learner progress on different platforms can be synced across devices. The program also includes features for interacting with both the program and with other language learners.

Duolingo makes claims about its effectiveness, based in part on Vesselinov and Grego’s (Reference Vesselinov and Grego2012) finding that for first language (L1) English speakers with some previous knowledge of Spanish, an average of 34 hours of Duolingo usage covered the same material as a first college semester of Spanish. Although they argued it would be fair to expect similar results across other languages, Vesselinov and Grego cautioned that such a claim could not be made without further empirical support. Nevertheless, Duolingo promotes as a primary selling point that it is more effective than a university language course (a claim evidenced in founder Luis von Ahn’s discussion posting from 2013 [] and Duolingo’s promotional video []). Although von Ahn recently stated that Duolingo works better as a supplement to in-person formal instruction (“Interview with Duolingo founder Luis von Ahn,” 2016), Duolingo’s website still makes strong claims of its effectiveness (and even superiority) as a stand-alone language learning experience (Duolingo, n.d.).

In contrast, published accounts of Duolingo in academic journals vary in their assessments. In a recent quasi-experimental study, Rachels and Rockinson-Szapkiw (Reference Rachels and Rockinson-Szapkiw2018) investigated elementary school students learning Spanish through either face-to-face instruction or Duolingo. After 40 minutes of exposure a week for 12 weeks, the two groups’ performance on grammar and vocabulary tests was not statistically different, leading Rachels and Rockinson-Szapkiw to claim that Duolingo may be “an affordable, cost-effective option” for L2 instruction (p. 84). Similarly, Cunningham (Reference Cunningham2015) reviewed the program and concluded that

Duolingo offers a fairly convenient, free, and basic mobile learning application which contains motivational DGBL [digital game-based learning] features that give it enough of an addictive edge for many learners to stay engaged. However, its strict linear curriculum, lack of authentic language, and limited assortment of activities prevent its full realization, relevance, and utility as a DGBL opportunity. (p. 8)

In an investigation into Duolingo’s features for interaction, Falk and Götz (Reference Falk, Götz, Zeyer, Stuhlmann and Jones2016) found that although a majority of their 212 participants learning German felt that the automated feedback and interaction with the program itself were positive and helpful, only a small percentage indicated that they used features for interacting with other language learners. Finally, in response to Vesselinov and Grego’s (Reference Vesselinov and Grego2012) findings of greater learning for Duolingo learners compared to classroom learners, Krashen (Reference Krashen2014) described how Duolingo relies primarily on techniques that promote conscious learning and explicit L2 knowledge, which he argues do not support the acquisition of language competence or implicit L2 knowledge. Rather, Krashen argued that learners need to engage in subconscious learning to develop L2 proficiency.

Due to the limited empirical research investigating the effectiveness of Duolingo, and other MALL apps, for L2 development, the current study set out to investigate the following research questions:

  • RQ1. How effective is Duolingo in developing L2 knowledge in ab initio learners of Turkish?

  • RQ2. What are the experiences of ab initio learners of Turkish using Duolingo to study an L2?

2. Method

Making use of multiple data sources, this mixed-methods study examined the L2 development and experiences of nine individuals who chose to study Turkish using Duolingo as an option for a class research project in a graduate seminar on instructed second language acquisition (ISLA) in early 2016.

2.1 Participants

Nine individuals from Michigan State University participated (three female, six male). At the time of the study, eight people were graduate students (two master’s and six doctoral) and one was a professor. Individuals came from a variety of L1 backgrounds, including English (six), Chinese (one), Korean (one), and Nepali (one), and all had previous L2 learning experience, with some participants being bilingual (e.g. Korean/English, Chinese/English) or multilingual (e.g. Nepali/Hindi/English), and others having lesser proficiency in additional languages (e.g. English L1 speakers with intermediate Korean, advanced Portuguese, or intermediate Spanish). However, none had previous exposure to Turkish, and therefore were considered ab initio learners.

The nine learners simultaneously served as both participants and researchers. While not a novel approach to applied linguistics research (e.g. Casanave, Reference Casanave2012; Schmidt & Frota, Reference Schmidt, Frota and Day1986), we recognize that our own backgrounds and perspectives have influenced our process of learning as well as analysis and interpretation of the data. From a learning perspective, as experienced L2 learners and researchers, we possessed theoretical knowledge of language acquisition likely uncommon to other Duolingo users. Additionally, because this research was conducted as part of an obligatory class requirement, the motivation for studying was presumably different from most Duolingo users, and we each knew that the study’s overall goal was to put Duolingo’s effectiveness as a language learning app to the test. With regard to analysis and interpretation, we attempt to present our findings with as much transparency as possible; however, we acknowledge the need to view our interpretations in light of our own personal investment.

2.2 Materials

2.2.1 Duolingo

Duolingo was chosen for this study because it is one of the most popular language learning apps worldwide, it claims to promote L2 development, and it is available without a subscription fee. It should be noted that Duolingo continues to modify its material, adding and deleting features over time. However, this section details some of Duolingo’s main features and content at the time of the study.

Duolingo is accessible via both desktop/laptop computers and mobile devices through the Duolingo app. As shown in Figure 1 (desktop/laptop web interface), Duolingo presents learning targets in skills (e.g. Basics, Phrases, Adj 1). Each skill is subsequently divided into lessons (e.g. two lessons on question formation, three lessons on food). Skills are presented in a mostly linear order, with new skills becoming available as previous skills are completed (i.e. completing all lessons associated with a skill). A splash page for each skill provides brief, explicit grammar information, although this feature is available only via the web interface. Within a lesson, Duolingo employs several activity types. Common to both web and mobile platforms are L1–L2/L2–L1 translations (as seen in Figure 2), multiple-choice translations, and dictation. For the study of Turkish, oral repetition exercises were accessible only via the desktop/laptop web interface, while vocabulary matching and sentence unscrambling were available only for mobile learning. In addition to these activities, learners had ready access to word translations through mouseovers (web) and finger taps (mobile). Typical of MALL apps, feedback in Duolingo primarily involved “displaying a correct answer or indicating a ‘Right/Wrong’ evaluation” (Burston, Reference Burston2014a: 347). Social networking features include the ability to post discussion board comments about sentences used in learning activities (see Figure 3) and to follow friends’ progress on a leaderboard. Common across Duolingo’s technical features is a lack of contextual meaning and communicative tasks. Duolingo focuses primarily on lexical and grammatical elements, both explicitly (e.g. through grammar translation activities) and implicitly (e.g. input enhancement). To motivate learners, Duolingo takes a gamified approach to language instruction (Werbach, Reference Werbach, Spagnolli, Chittaro and Gamberini2014). Learners can set daily goals in the form of experience points (XP) that can be earned by completing lessons. Reaching daily XP goals is rewarded with streaks, with the length of streaks indicating consecutive days of study. These skills and streaks are converted into Duolingo’s digital currency, called lingots, which in turn enable users to access timed practice sessions and purchase flair for the program’s avatar, Duo (a cartoon owl mascot).

Figure 1. Duolingo web interface home screen

Figure 2. A translation exercise with explicit corrective feedback

Figure 3. A function to post discussion board comments

2.2.2 Participant journals

Participants maintained journals to record weekly reflections and personal observations of their experiences. Areas of emphasis included (a) time spent on the app; (b) platform used (mobile versus PC); (c) levels completed, including the use of review; (d) successes achieved/difficulties encountered; and (e) connections between Duolingo and ISLA theory.

2.2.3 Duolingo progress test

The Duolingo Progress Test is a language proficiency measurement offered within the Duolingo platform. The test is claimed by Duolingo to be adaptive (i.e. increasing or decreasing in difficulty depending upon user performance), and the test maintains an item format similar to the instructional component. Upon completion, test takers receive a score from 0.00 to 5.00. No explanation of this score or additional performance feedback is provided by Duolingo.

2.2.4 Turkish 151 test

A Turkish 151 Test (see supplementary materials) was derived from a university-level, end-of-course summative achievement test for first semester L2 Turkish learners. The test was administered and graded by a Turkish university course instructor. The exam, which maintained a high internal reliability of α = .86, was divided into 10 sections covering aspects of listening, reading, lexicogrammar, speaking, and writing. For Turkish 151, a cut score of 70% represented a “pass” on the exam (see Table 1 for the entire grading scale). According to the instructor, most first-year students achieved a 90% or above on the test after one semester in the Turkish L2 program.

Table 1. Grading scale for the Turkish 151 Test

2.3 Procedure

This study, as depicted in Figure 4, was divided into four phases: (1) Turkish study and journaling, (2) language assessment, (3) additional Turkish study, and (4) language assessment. The following provides a detailed description of each phase of the study.

Figure 4. Graphic representation of this study’s procedure

2.3.1 Phase 1

Participants began studying Turkish on Duolingo in January 2016 and agreed to study at least one hour a week for the next 12 weeks for the course project. Based on other studies that set predetermined study goals (e.g. Lord, Reference Lord2015; Nielson, Reference Nielson2011), participants were encouraged to achieve an overall study goal of 34 hours in order to evaluate the claims made by Vesselinov and Grego (Reference Vesselinov and Grego2012); however, participants were not required to reach this goal.

Participants were allowed to use only the resources available on Duolingo. Assistance from outside learning resources (e.g. Turkish language textbooks or websites) was prohibited. This restriction is in line with Rosell-Aguilar’s (Reference Rosell-Aguilar2018) finding that one third of busuu (another MALL app) users relied exclusively on the app for L2 study. However, within the Duolingo program, participants were allowed to use the materials in whichever ways they preferred. There was no designated module or endpoint that needed to be reached prior to the conclusion of the study, although simple past tense, which was module 36 in the Duolingo skills tree, served as a target structure, because it was included on the Turkish 151 Test. Participants could advance through the 67 modules as quickly or as slowly as desired, reviewing past modules as they saw fit. During the study phase, participants maintained a journal to record their weekly progress, as well as any insights into or opinions about their experiences.

2.3.2 Phase 2

After 12 weeks of Turkish study, all participants, regardless of total study time, completed the Duolingo Progress Test and the Turkish 151 Test in April 2016.

2.3.3 Phase 3

After these two phases, participants who had not finished 34 hours of study and were so inclined (n = 5) continued studying Turkish to reach the 34-hour study target.

2.3.4 Phase 4

In July 2016, the remaining five participants retook the Duolingo Progress Test and the Turkish 151 Test.

As the description of phases indicates, participants varied in their completion of the Turkish study, as is typical in naturalistic self-directed learning. Figure 5 illustrates how participants distributed their learning time over the course of the study. Participants 1 and 2 completed the targeted 34 hours within Phase 2 (the initial 12-week period), whereas Participants 3, 5, 7, 8, and 9 completed theirs for the Phase 4 testing. Participants 4 and 6 chose not to pursue Duolingo study beyond Phase 2, which coincided with the end of the university semester.

Figure 5. Participant Turkish study time (in minutes) each week of the project. One participant’s (#9) weekly totals were missing

2.4 Analysis

A concurrent mixed-methods design (Creswell & Plano-Clark, Reference Creswell and Plano-Clark2011) was used to analyze the data. Although related, the two research questions addressed different components of the learning process (outcomes versus experiences), necessitating multiple methodological approaches. Quantitative analyses addressed the first research question pertaining to learning outcomes, while a qualitative analysis concurrently explored the second research question pertaining to participants’ learning experiences. Finally, the two approaches were brought together as a means of triangulating the findings and gaining an understanding of how outcomes and experiences can each inform the other (Cerezo, Reference Cerezo2016; De Costa, Valmori & Choi, Reference De Costa, Valmori, Choi, Loewen and Sato2017).

2.4.1 Quantitative analyses

Descriptive statistics were calculated for language study (study time, study platform, total Duolingo XP, Duolingo XP for new lessons, and Duolingo XP for review) and learning outcomes (Duolingo Progress Test, Turkish 151 Test). For the Turkish 151 Test, raw total scores and subscores (reading, writing, lexicogrammar, listening, and speaking) were calculated and converted to percentages to allow for comparability.

To examine the relationship between language study and learning outcomes, bivariate correlations were run. Due to the small sample in the present study, Spearman correlations were selected.

2.4.2 Qualitative analyses

To address the experiences of ab initio language learners using Duolingo, journals were thematically coded to identify themes that highlighted similarities and differences across participants’ Duolingo usage.Footnote 1 Each journal underwent a three-cycle coding process (Saldaña, Reference Saldaña2016). In cycle one, descriptive coding was employed in which two participants identified keywords common across participants’ data. From these keywords, four themes were agreed upon. In cycle two, each of the nine participants reread their own journal and identified quotes relevant to the four identified themes. In the final cycle of coding, the two initial coders devised subcodes based on a thematic review of the quotes compiled during cycle two.

3. Results

3.1 RQ1: Duolingo learning outcomes

Table 2 presents descriptive statistics for the learning experience (study time, study platform, Duolingo XP) and learning outcome (Duolingo Progress Test, Turkish 151 Test) variables. As mentioned, not all participants achieved the 34-hour goal; the lowest number of hours was 12. To access Duolingo material, participants used the mobile app an average of 23 minutes a week, and their laptop or desktop computers an average of 32 minutes a week. The number of Duolingo exercises completed varied substantially, as can be seen in Duolingo XP totals ranging from 730 to 4347. Duolingo also allowed for tracking XP earned from completing new lessons and reviews. Generally, participants did somewhat more reviewing (M = 1786) than attempting new material (M = 1123), although the amount of reviewing varied greatly.

The average Duolingo Progress Test score was 0.63 out of 5.00, with a range from 0.35 to 1.78. However, without any guidelines for score interpretation from Duolingo, it is difficult to know what to make of this result.

Table 2. Duolingo study and learning outcome descriptive statistics

Note. Duolingo XP was awarded by the program at a rate of 10 XP per lesson completed. In a timed review activity format, up to 20 XP could be earned per review lesson based on the number of exercises answered correctly.

3.1.1 Turkish 151 scores

The Turkish 151 Test had an overall average score of 48% (95% confidence interval (CI) [33%, 62%]), and a range from 23% to 76%. The test covered the four language skills (listening, speaking, writing, and reading) and lexicogrammatical knowledge (see Table 3). In examining these subscores, a trend is apparent: participants were relatively more successful on parts of the test dealing with written language (writing, reading, and lexicogrammar) and less successful on parts featuring oral language.

Table 3. Turkish 151 Test subscore summary

Note. All values are percentage points.

Figure 6 shows this relationship graphically via boxplots with superimposed means and 95% CIs, with the total Turkish 151 Test added on the left side. Participants’ scores are represented by dots, demonstrating individual variation. In comparing subscores, some caution is warranted because different item and response types likely impacted the difficulty and distribution of scores (e.g. constructed response for speaking vs. multiple choice for lexicogrammar).

Figure 6. Boxplots of Turkish 151 Test total and subscores. Median (vertical bar inside boxes), mean (thick dots), and error bars representing 95% CIs are included

3.1.2 Relationship between language study and learning outcomes

Given that all participants began with no prior Turkish study and had no exposure to the language outside of Duolingo study, examining correlations between the amount of language study and the learning outcome variables allows for causal relationships to be considered (see Table 4). None of the language study variables were strongly correlated with Duolingo Progress Test scores. However, there were moderately strong correlations between Duolingo XP and the Turkish 151 Test scores and subscores. In particular, Duolingo XP had large correlations with listening and speaking subscores. When considering New Lesson XP and Review XP separately, the former was more strongly associated with Turkish 151 Test scores and subscores.

Table 4. Correlations among language study and learning outcome variables

Note. Spearman correlations reported.

In sum, completing learning activities on Duolingo over the course of the research study was strongly associated with Turkish learning as measured by an end-of-semester Turkish 151 Test. The degree to which Duolingo study led to Turkish skill/knowledge development varied. However, the level of achievement for all but one participant fell short of the 70% criterion for mastery in a university Turkish 151 course.

3.2 RQ2: The experience of learning Turkish on Duolingo

In order to address the second research question, an analysis of participants’ journal entries was conducted. Major themes in the data included flexibility of use, wavering motivation, perceived progress, and individual approaches to Duolingo.

3.2.1 Flexibility of use

Considering that flexibility of place and time have been highlighted as key affordances of MALL technology (Kukulska-Hulme, Reference Kukulska-Hulme and Chapelle2012), it is not surprising that this quality was highlighted numerous times by the participants. As became evident across journals, the versatility in location of use was viewed as a positive affordance. Multiple participants mentioned using Duolingo in geographically diverse regions within the United States (e.g. Florida, Illinois), and Participant 6 discussed her international use while in Kenya:

I am in the car on a road trip in the middle of Kenya … I’ll say that is one great thing about having an app for Duolingo; I’ve been able to take it on the go with me on spring break. (#6)

On the local level, while much use appeared to occur at home, both coffee shops (“Did 10 minutes on my phone standing in the Starbuck’s line” – #7) and buses (“I can easily use while waiting for the bus, on the bus, and waiting for orders in restaurants” – #5) also served as popular study spots. In addition, the flexibility of study platforms was commented on (“For Wednesday, I actually did a combo of mobile and PC/laptop, as the day was fairly crunched” – #8). Despite this flexibility, learning was still sometimes seen as a burden, and using Duolingo whenever free time became available was often in contrast with personal desires:

It took me 20 minutes to get my 20 XP this morning. I was resentful of the time because I have so much else I need to do, but I want to keep at this Turkish thing. (#7).

After getting home late today and indulging in too much TV, I really don’t have time for Duo. (#1)

A lack of user persistence with MALL apps has been previously documented (e.g. Nielson, Reference Nielson2011; Vesselinov, Reference Vesselinov2009; Vesselinov & Grego, Reference Vesselinov and Grego2012). As shown by Participants 1 and 7, despite having flexibility, it took a concerted effort just to put in what they had set as their minimum daily usage. For Participant 7, this effort came from a desire to “keep at this Turkish thing,” a sentiment frequently echoed by Participant 1. Several participants capitalized on Duolingo’s flexibility by breaking their study into short increments of time (e.g. Participant 2 managing frequent “courtesy 10 minutes of study” chunks), but others did not, choosing instead to study less often but for longer periods of time (e.g. “This week I have not learned Turkish until Friday today. I just did so for forty minutes. I had to keep on going as a part of class project” – #3).

3.2.2 Wavering motivation

Duolingo’s flexibility alone did not counter the dips and declines in participant motivation. Journal entries showed that motivation was relatively high across the group at the outset of study, even if several individuals approached the project with tempered expectations. Yet even for those who began without some level of trepidation, there was generally a decrease in motivation as they progressed towards their 34 hours of use. Turkish was chosen as the language of study because no one had previous exposure to the language. One limitation to this approach, however, was that there was little initial investment in the language or culture itself. Consequently, motivation was fostered or hindered primarily from external sources and desires:

I am not integratively motivated to learn Turkish at least up until this moment. I am a bit instrumentally motivated as this is my final project. (#3)

With intrinsically related motivation a non-factor, it was left to Duolingo’s pedagogic and technological approach to develop and maintain motivation, an approach that was met with mixed results.

Pedagogically, there were few positive comments in the group’s reflections. Key limitations consistently mentioned related to the repetitiveness of the activities and a lack of interaction:

My motivation level this week was so low to learn Turkish. This might be because of the same type of exposure to the linguistic items and the same way of presentation of the materials: translate the sentences and words, type what you hear and choose the correct option. (#3)

There is no interaction. I don’t have to take any risks to try out the language with anyone. In many cases, especially with the mobile app, I don’t feel like I have to produce much Turkish, so I can be pretty passive as a learner if I want. (#7)

Participant 3 highlighted how the types of tasks in Duolingo demonstrated little variation as the course progressed. Of greater concern, then, was that the repetition of translation and type what you hear tasks left minimal (if any) opportunities for interaction, as strongly stated by Participant 7. This absence is particularly concerning, and understandably demotivating, considering the importance of interaction in L2 acquisition (Gass & Mackey, Reference Gass, Mackey, VanPatten and Williams2015). Such frustrations led to a more obligatory orientation to continuing study; ultimately, two of the participants (4 and 6) ceased study as soon as the initial 12-week target was reached, despite not logging 34 hours of study time.

Despite language- and pedagogical-based limitations, one area where Duolingo did seem to succeed was gamification (Werbach, Reference Werbach, Spagnolli, Chittaro and Gamberini2014). Being able to see the progress of their fellow classmates (and instructor) served as a motivational tool for many to continue their usage:

This week I received several emails to start following people from our group. Now Duolingo feels more community-oriented. I can see how much progress everyone else has made. This aspect motivates me to work on more modules and gain XP points to compete against others. (#8)

Although this indeed led to greater Duolingo use, it may have led some participants to focus less on language learning and more on reaching the top of the XP leaderboard (“I was planning on putting in a decent amount of time today, so I think I should at least be able to pass Participant #2; maybe Participant #6.” – #1). Yet, as stated earlier, a greater number of hours and XP were positively associated with performance on the Turkish 151 Test, so this competitive component presumably provided some learning benefits.

3.2.3 Perceived progress

Having committed to 34 hours of study and taking the Duolingo Progress Test and Turkish 151 Test, the participants naturally reflected on the progress they perceived in their learning. While Duolingo provided several barometers for measuring progress (e.g. Duolingo Progress Test, level, XP), participants expressed doubts about truly making progress:

I could finish all the modules as well, but I definitely wouldn’t claim to be a functional speaker of the language. That would be disingenuous. So, the moral of the story is not to get too excited for the [completion of the Duolingo skills tree] since all you get is a big, digital owl and a false sense that you can speak the language. (#8)

Although they were all completing lessons, gaining XP, and attaining higher Duolingo levels, the participants had trouble interpreting Duolingo’s measures of progress (“Apparently I am now a level 10 … I do not know what that means” – #2). Generally, the number of lessons they had completed and the XP acquired did not satisfactorily align with their perceptions of their actual acquisition (“I felt more overwhelmed seeing how many lessons I had learned but remembered very little” – #8).

There was tension in regard to what they could actually do with the language (“I also am starting to feel like it’s all been a waste of time, especially after talking with some classmates about what we actually know/can do with the language. Spoilers: not much” – #1). The participants noticed a divide between the language they could recognize in exercises versus what they felt they could produce, with participants reflecting that they “can’t speak Turkish” (#5) and not being “sure if I’m articulating the correct phoneme” (#9) in the listen-and-repeat exercises.

Building on their perceived (or lack thereof) receptive and productive progress, several perspectives were taken on the Duolingo Progress Test and their upcoming Turkish 151 Test:

And I’m “happy” to report that I scored .62/5.0, which is up .13 from last month. I was a bit worried if I would do better than my last quiz because I took several days off from study. I put the happy in air quotes because of course I would have liked to have done better. (#7)

While this quote shows a level of perceived accomplishment on the Duolingo Progress Test, other participants lacked optimism prior to the Turkish 151 Test (“Nothing else to really report, except that my confidence level goes down more and more the closer we get to the exam, and I am pretty sure I will not do well, despite my high number of hours” – #2). This pessimism was just as evident after the post-test as well (“I am not satisfied with what I did in the test. I feel like I would have done better if I had learned Turkish in the face-to-face mode with a teacher in the real classroom” – #3).

3.2.4 Individual approaches to Duolingo

Each of the nine participants made use of Duolingo differently. For the majority, reviewing previously completed lessons took priority over advancing to new ones:

The first couple of days I started new levels, but as of Wednesday, I did not feel properly prepared to continue on, so I began to use the review almost everyday … I used the review function consistently, probably 75% of my time on Duolingo is review. (#2)

This excerpt is one example of how review was a regular part of studying. Although some review was for learning purposes (“I also did a lot of review to make sure that I remember some of what I’ve already studied” – #7), other, less acquisition-based reasons existed (“For the last couple of days, I’ve been reviewing the vocabulary or basic structure lessons since I know it is easy and don’t need to think much of grammar” – #5). No matter the reason, however, the process of review comprised a significant portion of participants’ Turkish study (“Averagely, I spend only 5 minutes on learning a new lesson, but I spend almost 15 minutes on reviewing one, because I kept forgetting them” – #9).

Several participants used Duolingo’s material to create extensive study notes (see Figure 7). Participant 2 included all “vocabulary, phrases, [and] grammar points” he encountered, while Participants 6 and 7 also incorporated full sentences from Duolingo into their notes. A common thread among the note takers was regularity: notetaking went hand-in-hand with logging into the appFootnote 2.

Figure 7. Example of Participant 2’s study notes

While participants differed in their macro-level approaches to prioritizing review or taking notes, they also differed noticeably in their micro-level approaches to completing lessons on Duolingo. Some participants prioritized efficiency over carefully responding to lesson questions; one such example was the use of keywords in the stimuli to select the appropriate response:

In one timed review, I had to choose among three really long translations. I came nowhere close to reading all of them … I quickly honed in on a keyword from the English sentence and searched the translations … I could quickly eliminate two of them simply for not having the keyword. (#1)

Another similarly expedient approach was to exploit mouseover translations during practice. Participant 2 reported that “For the newer Units, I find myself relying a lot on scrolling over the word, especially for function words and newer nouns.” Other participants frequently reported mouseovers, and in one case a participant may have taken things too far, essentially bypassing learning in favor of quick progress through lessons:

I also figured out a way to cheat on these lessons. I can just copy and paste answers if asked to provide a Turkish translation. Hovering your mouse over the item both displays the translation and lets me copy it. I stopped doing this in order to be “incorrect” a few times to make me repeat, and hopefully learn the item. In my opinion, I have to make mistakes and repeat to “learn” in this program. If not, I could breeze right through this. (#8)

4. Discussion

In response to the first research question, “How effective is Duolingo in developing L2 knowledge in ab initio learners of Turkish?”, the results are somewhat mixed. On the positive side, all participants knew more Turkish at the end of the study than when they began. This observation may seem self-evident; however, the importance of this finding should not be underestimated, particularly in light of the lack of empirical evidence of L2 learning in many MALL studies (Burston, Reference Burston2015) and the calls for evidence of the effectiveness of technology (e.g. Heift & Chapelle, Reference Heift, Chapelle, Gass and Mackey2012; Plonsky & Ziegler, Reference Plonsky and Ziegler2016). For example, Participant 4, who spent only 12 hours using Duolingo, scored 23% on the Turkish 151 Test. Most notably, however, even after 34 hours of study, only one participant received a score that would be considered a passing grade in the university’s first semester Turkish course. These results call into question Vesselinov and Grego’s (Reference Vesselinov and Grego2012) claims regarding Duolingo’s efficacy. One obvious reason for this discrepancy could be time on task. Many university courses consist of four hours of class time per week, plus time outside of class for homework. If students only attend class and do not do any homework, they are still spending more than twice as much time studying the target language. Underscoring this point is the fact that Rachels and Rockinson-Szapkiw (Reference Rachels and Rockinson-Szapkiw2018) found no statistical difference in the linguistic proficiency of learners who spent the same amount of time studying face-to-face or with Duolingo.

Another explanation for the low test scores could be the target language itself. None of the participants had previous knowledge of Turkish; in contrast, Vesselinov and Grego’s (Reference Vesselinov and Grego2012) study investigated Spanish L2 learners, none of whom were complete beginners. Thus it is possible that learners may make more progress studying a language with which they have some proficiency. Additionally, the qualitative analysis revealed that several participants were not especially motivated by the choice of Turkish as the target language. Nevertheless, as Van Deusen-Scholl (Reference Van Deusen-Scholl2015) argues, it is important to investigate learning effects for all languages, including less commonly taught ones.

An additional consideration is the type of L2 knowledge that was learned, whether explicit/declarative or implicit/proceduralized (Krashen, Reference Krashen2014). Although the Turkish 151 Test was not designed to provide relatively distinct measures of explicit or implicit knowledge, it did have subsections in which the two types of knowledge may have been more or less useful. Notably, participants scored lowest on the speaking (33%) and listening (37%) sections, which both required processing language in real time. In contrast, higher scores were achieved on the reading (57%), writing (55%), and lexicogrammar (50%) sections, which promoted greater use of explicit knowledge. Given the nature of Duolingo’s pedagogy, relying primarily on grammar-translation and audiolingual-type activities, as is common in MALL (Reinders & Pegrum, Reference Reinders, Pegrum and Tomlinson2015), as well as the relatively short amount of study time, it is most likely that the gains made by the participants were primarily in explicit knowledge, even though Duolingo does not provide much in the way of explicit, metalinguistic rules. It is possible that the participants’ familiarity with the process of L2 learning in general may have helped them take advantage of the pedagogical materials, while less linguistically sophisticated learners might be more disadvantaged.

Another important finding is that there was a moderate correlation between the amount of study time and test scores (r = .58), although this relationship is affected by the small sample size and ceiling effects in amount of study time. Even so, it appears that the more time an individual spends using the app, the more they are going to learn. Again, this finding may seem self-evident; however, the high rates of attrition when using online technology (Nielson, Reference Nielson2011) suggest that learners may not persist long enough to make considerable gains in their L2 knowledge, especially without any obligation or encouragement from peers and teachers commonly found in classroom environments. Thus, Petersen and Sachs’ (Reference Petersen, Sachs, Leow, Cerezo and Baralt2016) claim that “technology is not a substitute for instructional expertise” (p. 5) seems appropriate, and indeed, the importance of using Duolingo or other language learning apps in conjunction with more formal classroom contexts has been acknowledged (e.g. “Interview with Duolingo founder Luis von Ahn”, 2016; Kukulska-Hulme et al., Reference Kukulska-Hulme, Lee, Norris, Chapelle and Sauro2017).

5. Limitations

Although providing insights into Duolingo as a source of language instruction, the current study has several limitations. The small sample size reduces generalizability. Perhaps more important are the uncommon characteristics of the participants in the study: the participants’ prior successes in L2 learning and familiarity with SLA theory might have positively influenced the efficacy of their study, meaning that a more typical Duolingo user might be expected to achieve even less. The unilateral choice of Turkish appeared to negatively impact some learners’ motivation; consequently, learning gains might be better if learners have a choice in the target language. In contrast, the mandatory nature of the class project and the knowledge of the goals of the study may have increased some learners’ motivation and persistence beyond a level typical of Duolingo users at large. At the very minimum, the class project prevented attrition for at least 12 weeks. In spite of these limitations and learner differences, the participants all experienced the same learning materials, as would any L2 learner using Duolingo, regardless of their backgrounds.

6. Conclusion

In summary, this study provides one of the few systematic investigations into the effectiveness of a widely used commercial language learning app. The mixed findings in learning gains indicate that although apps such as Duolingo can improve learners’ L2 knowledge, the claims made by commercial materials may be overstated. However, the pedagogic shortcomings, such as a primary reliance on decontextualized grammar-translation exercises and audiolingual drills, are surmountable if app developers consider ISLA theory and research. For example, incorporating more meaning-focused or task-based activities in which learners engaged in language beyond the individual sentence level would be appropriate. On a positive note, the DGBL aspect of Duolingo provided a welcome motivational component to the app that could be incorporated into other learning contexts. Further research into (a) the effectiveness of commercial L2 learning apps and (b) the experiences of L2 learners who use them will provide insight into this popular method of L2 study and has the potential to help improve the quality of available products.


We would like to thank Talip Gonulal, Rachelle Oh, and Yichong Yin for their valuable contributions to this project.

Ethical statement

This research project was conducted in accordance with and the approval of the Institutional Review Board of Michigan State University. Participants were volunteers. There are no conflicts of interest.

About the authors

Shawn Loewen is a professor and director of the Second Language Studies program at Michigan State University. His research interests include instructed SLA, classroom interaction, technology-enhanced language learning, and research methodology.

Dustin Crowther is a visiting assistant professor at Oklahoma State University, and holds a PhD in Second Language Studies from Michigan State University. His research interests include L2 speaking, mutual intelligibility in multilinguistic/multicultural contact, and world Englishes. His research has been published in a wide range of journals.

Daniel R. Isbell is a PhD candidate in the Second Language Studies program at Michigan State University. He is interested in how technological affordances can support language learning, especially for less commonly taught languages. Daniel’s other interests include assessment and L2 pronunciation.

Kathy MinHye Kim is a PhD candidate in the Second Language Studies program at Michigan State University. Her research interests include explicit–implicit knowledge/learning, sleep-dependent memory consolidation, instructed SLA, and individual differences in aptitude.

Jeffrey Maloney is an assistant professor of English and the director of the ESL Academy at Northeastern State University. He holds a PhD in Second Language Studies from Michigan State University. His current research interests include language teacher education, teacher training for CALL, and language learner and heritage speaker identity.

Zachary F. Miller is an assistant professor at the United States Military Academy at West Point, and holds a PhD in Second Language Studies from Michigan State University. His research interests include the role of emotions in SLA, SLA from a military perspective, and Brazilian literature.

Hima Rawal is a PhD candidate in Second Language Studies at Michigan State University. Her research interests include language teacher professional development, teacher identity/ideologies, teacher/learner beliefs and emotions, study abroad, translanguaging in multilingual classrooms, linguistic landscape, and South Asian languages in diaspora settings.


1 Note that a narrative analysis of a subset of the journals is presented in Isbell, Rawal, Oh and Loewen (Reference Isbell, Rawal, Oh and Loewen2017).

2 Participants’ notes were not systematically analyzed for content due to their idiosyncratic nature.


Burston, J. (2014a) MALL: The pedagogical challenges. Computer Assisted Language Learning, 27(4): 344357. Scholar
Burston, J. (2014b) The reality of MALL: Still on the fringes. CALICO Journal, 31(1): 103125. Scholar
Burston, J. (2015) Twenty years of MALL project implementation: A meta-analysis of learning outcomes. ReCALL, 27(1): 420. Scholar
Casanave, C. P. (2012) Diary of a dabbler: Ecological influences on an EFL teacher’s efforts to study Japanese informally. TESOL Quarterly, 46(4): 642670. Scholar
Cerezo, L. (2016) Type and amount of input-based practice in CALI: The revelations of a triangulated research design. Language Learning & Technology, 20(1): 100123. Scholar
Creswell, J. W. & Plano-Clark, V. L. (2011) Designing and conducting mixed methods research (2nd ed.). Thousand Oaks: SAGE.Google Scholar
Cunningham, K. J. (2015) Duolingo. TESL-EJ, 19(1): 19.Google Scholar
De Costa, P. I., Valmori, L. & Choi, I. (2017) Qualitative research methods. In Loewen, S. & Sato, M. (eds.), The Routledge handbook of instructed second language acquisition. New York: Routledge, 522540.CrossRefGoogle Scholar
Duman, G., Orhon, G. & Gedik, N. (2015) Research trends in mobile assisted language learning from 2000 to 2012. ReCALL, 27(2): 197216. Scholar
Duolingo. (n.d.) About us: Press. Retrieved from Scholar
Falk, S. & Götz, S. (2016) Interactivity in language learning applications: A case study based on Duolingo. In Zeyer, T., Stuhlmann, S., and Jones, R. D. (eds.), Interaktivität beim Fremdsprachenlehren und -lernen mit digitalen Medien: Hit oder hype? Tübingen: Narr Francke Attempto Verlag GmbH + Co, 237258.Google Scholar
Gass, S. M. & Mackey, A. (2015) Input, interaction, and output in second language acquisition. In VanPatten, B. & Williams, J. (eds.), Theories in second language acquisition: An introduction (2nd ed.). New York: Routledge, 180206.Google Scholar
Godwin-Jones, R. (2011) Mobile apps for language learning. Language Learning & Technology, 15(2): 211.Google Scholar
Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, D. L. & Freynik, S. (2014) Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, 27(1): 70105. Scholar
Heift, T. & Chapelle, C. A. (2012) Language learning through technology. In Gass, S. M. & Mackey, A. (eds.), The Routledge handbook of second language acquisition. Abingdon: Routledge, 555569.Google Scholar
Interview with Duolingo founder Luis von Ahn.” (2016) The Language Educator, 11(1): 1517.Google Scholar
Isbell, D. R., Rawal, H., Oh, R. & Loewen, S. (2017) Narrative perspectives on self-directed foreign language learning in a computer- and mobile-assisted language learning context. Languages, 2(2): 4. Scholar
Krashen, S. (2014) Does Duolingo “trump” university-level language learning? The International Journal of Foreign Language Teaching, 9(1): 1315.Google Scholar
Kukulska-Hulme, A. (2009) Will mobile learning change language learning? ReCALL, 21(2): 157165. Scholar
Kukulska-Hulme, A. (2012) Mobile-assisted language learning. In Chapelle, C. A. (ed.), The encyclopedia of applied linguistics. Hoboken: Blackwell Publishing. Scholar
Kukulska-Hulme, A., Lee, H. & Norris, L. (2017) Mobile learning revolution: Implications for language pedagogy. In Chapelle, C. A. & Sauro, S. (eds.), The handbook of technology and second language teaching and learning. Hoboken: John Wiley & Sons, 217233.CrossRefGoogle Scholar
Levy, M. & Stockwell, G. (2006) CALL dimensions: Options and issues in computer-assisted language learning. Mahwah: Lawrence Erlbaum Associates.Google Scholar
Lord, G. (2015) “I don’t know how to use words in Spanish”: Rosetta Stone and learner proficiency outcomes. The Modern Language Journal, 99(2): 401405. Scholar
Nielson, K. B. (2011) Self-study with language learning software in the workplace: What happens? Language Learning & Technology, 15(3): 110129.Google Scholar
Pegrum, M. (2014) Mobile learning: Languages, literacies, and cultures. Basingstoke: Palgrave Macmillan. Scholar
Petersen, K. & Sachs, R. (2016) The language classroom in the age of networked learning. In Leow, R. P., Cerezo, L. and Baralt, M. (eds.), A psycholinguistic approach to technology and language learning. Berlin: De Gruyter, 322.Google Scholar
Plonsky, L. & Ziegler, N. (2016) The CALL-SLA interface: Insights from a second-order synthesis. Language Learning & Teaching, 20(2): 1737.Google Scholar
Rachels, J. R. & Rockinson-Szapkiw, A. J. (2018) The effects of a mobile gamification app on elementary students’ Spanish achievement and self-efficacy. Computer Assisted Language Learning, 31(1–2): 7289. Scholar
Reinders, H. & Benson, P. (2017) Research agenda: Language learning beyond the classroom. Language Teaching, 50(4): 561578. Scholar
Reinders, H. & Pegrum, M. (2015) Supporting language learning on the move: An evaluative framework for mobile language learning resources. In Tomlinson, B. (ed.), SLA research and materials development for language learning. London: Taylor & Francis, 116141.Google Scholar
Robertson, A. (2011) Duolingo will translate the internet while teaching languages. The Verge. Scholar
Rosell-Aguilar, F. (2018) Autonomous language learning through a mobile application: A user evaluation of the busuu app. Computer Assisted Language Learning, 31(8): 854881. Scholar
Saldaña, J. (2016) The coding manual for qualitative researchers (3rd ed.). London: SAGE.Google Scholar
Schmidt, R. & Frota, S. (1986) Developing basic conversational ability in a second language: A case study of an adult learner of Portuguese. In Day, R. R. (ed.), Talking to learn: Conversation in second language acquisition. Rowley, MA: Newbury House, 237326.Google Scholar
Shadiev, R., Hwang, W.-Y. & Huang, Y.-M. (2017) Review of research on mobile language learning in authentic environments. Computer Assisted Language Learning, 30(3–4): 284303. Scholar
Smith, C. (2018) 17 amazing Duolingo facts and statistics (April 2018). Scholar
Van Deusen-Scholl, N. (2015) Assessing outcomes in online foreign language education: What are key measures for success? The Modern Language Journal, 99(2): 398400. Scholar
Vesselinov, R. (2009) Measuring the effectiveness of Rosetta Stone: Final report. Queens College, City University of New York.Google Scholar
Vesselinov, R. & Grego, J. (2012) Duolingo effectiveness study: Final report. Queens College, City University of New York.Google Scholar
Vesselinov, R. & Grego, J. (2016) The Babbel efficacy study: Final report. Queens College, City University of New York.Google Scholar
Werbach, K. (2014) (Re)defining gamification: A process approach. In Spagnolli, A., Chittaro, L., and Gamberini, L. (eds.), Persuasive Technology: 9th International Conference, PERSUASIVE 2014, Padua, Italy, May 21–23, 2014. Proceedings. Cham: Springer, 266272. Scholar
Figure 0

Figure 1. Duolingo web interface home screen

Figure 1

Figure 2. A translation exercise with explicit corrective feedback

Figure 2

Figure 3. A function to post discussion board comments

Figure 3

Table 1. Grading scale for the Turkish 151 Test

Figure 4

Figure 4. Graphic representation of this study’s procedure

Figure 5

Figure 5. Participant Turkish study time (in minutes) each week of the project. One participant’s (#9) weekly totals were missing

Figure 6

Table 2. Duolingo study and learning outcome descriptive statistics

Figure 7

Table 3. Turkish 151 Test subscore summary

Figure 8

Figure 6. Boxplots of Turkish 151 Test total and subscores. Median (vertical bar inside boxes), mean (thick dots), and error bars representing 95% CIs are included

Figure 9

Table 4. Correlations among language study and learning outcome variables

Figure 10

Figure 7. Example of Participant 2’s study notes