Skip to main content Accessibility help


  • Access
  • Cited by 8


      • Send article to Kindle

        To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        Teachers’ Perspectives on Second Language Task Difficulty: Insights From Think-Alouds and Eye Tracking 1
        Available formats

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        Teachers’ Perspectives on Second Language Task Difficulty: Insights From Think-Alouds and Eye Tracking 1
        Available formats

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        Teachers’ Perspectives on Second Language Task Difficulty: Insights From Think-Alouds and Eye Tracking 1
        Available formats
Export citation


The majority of empirical studies that have so far investigated task features in order to inform task grading and sequencing decisions have been grounded in hypothesis-testing research. Few studies have attempted to adopt a bottom-up approach in order to explore what task factors might contribute to task difficulty. The aim of this study was to help fill this gap by eliciting teachers’ perspectives on sources of task difficulty. We asked 16 English as a second language (ESL) teachers to judge the linguistic ability required to carry out four pedagogic tasks and consider how they would manipulate the tasks to suit the abilities of learners at lower and higher proficiency. While contemplating the tasks, the teachers thought aloud, and we also tracked their eye movements. The majority of teachers’ think-aloud comments revealed that they were primarily concerned with linguistic factors when assessing task difficulty. Conceptual demands were most frequently proposed as a way to increase task difficulty, whereas both linguistic and conceptual factors were suggested by teachers when considering modifications to decrease task difficulty. The eye-movement data, overall, were aligned with the teachers’ think-aloud comments. These findings are discussed with respect to existing task taxonomies and future research directions.


The last three decades have seen a growing interest in the role of tasks in second language (L2) teaching and learning, with pedagogic tasks being increasingly promoted and used as a defining (Long, 1985, 2015, this issue; Van den Branden, 2006) or key (Bygate, 2000; Ellis, 2003) organizing unit of syllabi. The rationale for the inclusion of tasks in L2 instruction is multifaceted: First, tasks provide an optimal psycholinguistic environment for L2 processes to develop by offering plentiful opportunities for meaningful language use as well as timely focus on linguistic constructions as a specific need arises (Long, 1991). Second, task-based learning is well aligned with the principles of learning-by-doing and student-centered teaching, ideas that have been advocated and widely adopted by scholars in the field of general education (e.g., Dewey, 1913/1975). Finally, pedagogic tasks prepare learners to carry out genuine communicative tasks aligned with their future academic, professional, vocational, and/or personal needs. As a result, L2 instruction utilizing tasks often has high face validity (Long, 2005) and is motivating to students, who in turn engage with the tasks. Given these widely recognized advantages of integrating tasks into L2 syllabi, a considerable amount of research has been directed at exploring ways to optimize task-based language teaching (TBLT) with the aim of informing task-based practice (for reviews, see Ellis, 2003; Long, 2015; Samuda & Bygate, 2008; Ziegler, this issue).

Although a substantial amount of research has accumulated on TBLT, many issues remain unresolved, including the question of how tasks should be graded and sequenced within the syllabus in order to create ideal conditions for L2 learning. To date, no clear, empirically attested findings are available that can guide teachers in grading and sequencing tasks, despite the fact that extensive theoretical (e.g., Robinson, 2001, 2011; Skehan, 1998, this issue) and empirical work (e.g., Baralt, Gilabert, & Robinson, 2014) has been dedicated to addressing this issue. This is likely to be due to various factors, such as methodological shortcomings in existing research that may have confounded the internal validity of empirical studies (Norris, 2010; Révész, 2014) as well as a lack of comparable operationalizations of task- and language-related constructs across studies (Long, 2015; Long & Norris, 2015). An additional reason for the mixed findings might lie in that the two theoretical models, Skehan's (1998) limited capacity model and Robinson's (2001, 2011) triadic componential framework, which have driven the bulk of previous empirical research on factors determining task grading, might not incorporate the full spectrum of the variables that could inform task grading and sequencing decisions. Another possibility is that, some key variables, which are in fact included in the models, might not yet have been the object of ample empirical research.

Besides conducting hypothesis-testing research, there are additional ways to gain insights into what types of factors might be useful to consider when grading tasks, including the collection of learner perception data and/or expert opinions about sources of task difficulty. The aim of this study was to explore the latter—namely, whether introspective data gathered from one group of experts, L2 teachers, reflect existing theoretical views or open new understandings about factors contributing to task grading and sequencing criteria. To accomplish this goal, we asked L2 teachers to gauge the linguistic ability needed to perform a set of pedagogic tasks and consider how they would modify the tasks to make them suitable for learners with lower or higher proficiency levels. The methodological innovation of our research lies in our triangulation of the introspective data we collected with recordings of the participants’ eye movements while they simultaneously thought aloud.

In the sections that follow, we first look at the task taxonomies proposed in Skehan's (1998) limited capacity model and Robinson's (2001, 2011) triadic componential framework, followed by a review of Ellis's (2003) task framework, a model that also offers criteria for task grading but has received less attention to date. It is important to note that our focus is restricted to the factors integrated in the taxonomies. A detailed discussion of how these often overlapping factors are proposed to be manipulated to facilitate the effectiveness of task-based syllabuses is beyond the scope of this review.

The Limited Capacity Model

Skehan (1998) proposed utilizing three categories when assessing L2 task difficulty: code complexity, cognitive complexity, and communicative stress. Code complexity refers to the linguistic demands imposed by a task. Tasks that elicit the use of more advanced and a greater variety of constructions are likely to pose more difficulty. Also, learners are expected to experience more difficulty when they need to deploy more sophisticated, diverse, and dense lexis. Cognitive complexity captures the cognitive processes induced by the task. Within this category, Skehan makes a further distinction between cognitive familiarity and cognitive processing, with cognitive familiarity encapsulating the ability to handle familiar information with greater ease and cognitive processing referring to the extra demands posed on processing when new solutions are needed. Cognitive familiarity may stem from familiarity with the topic, discourse genre, and task. Cognitive processing demands, on the other hand, may increase if the information relevant to task completion is less structured, explicit, and clear. Cognitive demands are also anticipated to rise when the task requires greater amount of computation, that is, manipulating and transforming information. Communicative stress, the third category in Skehan's model, is concerned with the performance conditions under which the task is completed (see also Skehan, this issue). Task difficulty is likely to increase if the task is performed under greater time pressure, more participants are involved, real-time processing is required, there is more at stake, and there is no opportunity to alter the way the task is implemented.

The Triadic Componential Framework

Robinson (2001, 2011), in what he referred to as the triadic componential framework, also outlines a taxonomy of task characteristics in order to provide syllabus designers with operational criteria that can be used to classify and sequence tasks. Robinson's framework includes three main characteristic types: factors contributing to task complexity, task conditions, and task difficulty.

Task complexity factors determine the inherent cognitive demands of tasks; that is, task complexity appears similar to what is meant by cognitive complexity in Skehan's model. According to Robinson, level of task complexity should serve as the only basis underlying sequencing decisions in the syllabus. Task complexity can be enhanced by manipulating tasks along two types of task dimensions: resource directing and resource dispersing. Resource-directing features, by definition, relate to conceptual task demands. For example, the tasks that are expected to place enhanced conceptual demands on learners are those that require learners to engage in causal, intentional, or spatial reasoning; description of events that are displaced in time and space; and/or reference to many elements instead a few. Robinson further argued that resource-directing features have the capacity to direct learners’ attention to specific, task-relevant linguistic features. By way of illustration, tasks that require causal reasoning are likely to elicit more widespread use of logical connectors (e.g., therefore). Resource-dispersing dimensions, on the other hand, concern the procedural conditions of task performance. Task demands are anticipated to increase when learners need to carry out several rather than a single task concurrently; little or no planning time is made available; the task structure is unclear; and/or more steps are needed to complete the task.

Task conditions include variables that influence interactional task demands and subsume factors related to the interactional partners and the level and nature of participation required. Participant-related characteristics, for example, are concerned with whether participants have the same or different gender, proficiency level, and/or status and role. Task demands may also differ depending on the extent to which the partners are familiar with each other or share content and cultural background. Variables associated with the nature of participation include whether the task allows for multiple or one predetermined solution; the participants need to converge or can diverge on the task outcome; the task instructions call for one-way or two-way interaction; the participants need to contribute more or less during task performance; and/or the task-based interaction involves two or more participants.

Finally, the notion of task difficulty captures the fact that individual differences in ability (e.g., aptitude) and affect (e.g., anxiety) may also influence task-based performance and development. It is important to point out that, in Skehan's (1998) work, task difficulty is conceptualized in a more general sense to denote differences in overall task demands. Skehan regarded tasks as more difficult if they pose increased demands in terms of any of the three types of task factors proposed in the limited capacity model—code complexity, cognitive complexity, or communicative stress. In this article, we follow Skehan in using the term task difficulty in this more general sense.

Ellis's Criteria for Task Grading

Ellis's (2003) task classification framework delineates four types of task dimensions that can be used by syllabus designers in the task grading process: features related to the task input, task conditions, task processes, and task outcomes. Most factors subsumed under task conditions and processes are also included in the limited capacity model and/or the triadic componential framework, although they are labeled differently in some cases. According to Ellis, task conditions comprise variables describing the relationship between the interactants (one-way vs. two-way), the task demands (single vs. dual), and the discourse mode elicited by the task (dialogic vs. monologic). Task processes capture differences in the type (information vs. opinion exchange) and amount (few vs. many steps involved) of reasoning required.

A feature specific to Ellis's (2003) taxonomy is the fact that it distinguishes between input and outcome-related task criteria. Among the input features, Ellis listed the nature of its medium, classifying input presented in the oral mode as most difficult, followed by input appearing in the written and pictorial form. A second task input factor is code complexity; task input with more complex vocabulary and syntax is expected to pose more difficulty. Cognitive complexity, a third input-related factor in Ellis's framework, defines the task input as more difficult when it is more abstract, includes more elements and relationships, has less clear structure, and requires a there-and-then orientation. The last input feature in Ellis's taxonomy is termed as familiarity, encapsulating the expectation that familiar input eases processing load.

Factors that describe the task outcome comprise medium, scope, and discourse mode. With respect to medium, the need to articulate an oral outcome is anticipated to pose greater difficulty than to present an outcome in written form. In turn, a pictorial outcome is deemed easier to deliver than a written piece. Closed versus open outcomes may also influence the level of difficulty. Finally, task difficulty is likely to be enhanced when learners are asked to produce instructions or arguments rather than lists, descriptions, narratives, or classifications.

The Present Study

As mentioned previously, the majority of empirical studies that have so far investigated task features to inform task grading and sequencing criteria have been grounded in hypothesis testing, drawing on the task taxonomies outlined in Skehan's and Robinson's models. Little research thus far has attempted to adopt a more bottom-up approach in order to explore whether variables, besides the ones identified in these models, might contribute to task difficulty. The aim of this study was to help fill this gap by eliciting teachers’ perspectives on sources of task difficulty. We asked teachers to (a) judge the linguistic ability required to carry out four pedagogic tasks and (b) consider how they would manipulate the tasks to suit the abilities of learners at lower and higher proficiency levels. While contemplating the difficulty and manipulations of the tasks, the teachers were asked to say what they were thinking about. To triangulate these data, we tracked the eye movements of the teachers in an attempt to gain information about the extent to which they interacted with the task instructions and pictorial input. Combining introspective think-aloud data with behavioral eye-tracking data is an innovative aspect of this study. To the best of our knowledge, no study has yet triangulated think-aloud and eye-tracking data in the context of TBLT research.



The participants were 16 English as a second language (ESL) teachers. They were all asked to think aloud while first assessing the level and then manipulating the difficulty of four pedagogic tasks. Throughout this process, their eye movements were tracked. The four tasks were presented to the teachers on separate slides using Tobii Studio 3.0.9 eye-tracking software (Tobii Technology, n.d.). Task order was counterbalanced across participants.


The participants were recruited from two contexts, 10 ESL teachers from the United Kingdom and 6 ESL teachers from the United States. The mean age of the UK and U.S. teachers were 37.20 (SD = 11.67) and 42.33 (SD = 7.76) years, respectively. Most of the UK teachers were female (n = 9), whereas half of the U.S. teachers were male (n = 3). Half of the UK teachers were native speakers of English (n = 5), and the rest came from Japanese (n = 2), Korean (n = 2), or Greek (n = 1) first language backgrounds. Among the U.S. teachers, four were native speakers of English, the remaining two had Spanish and Ukrainian as their first language. While the UK teachers’ experience varied widely, ranging from 2 to 25 years of language teaching (median = 4.50, mean = 6.50, SD = 6.92), the U.S. teachers constituted a more homogeneous and overall more experienced group, with a range of 9 to 20 years of teaching (median = 14.50, mean = 14.50, SD = 3.20). Overall, the U.S. teachers also had higher qualifications; all of the teachers held a master's degree in TESOL or applied linguistics. A third of the UK teachers had a master's in TESOL or applied linguistics (n = 3); the rest were studying toward an MA in these fields (n = 6). All of the teachers had some familiarity with TBLT and the notion of task complexity. On 5-point Likert scale, they provided average ratings higher than 3 points of their knowledge of TBLT (UK: M = 3.60, SD = 0.97; U.S.: M = 3.40, SD = 1.20) and task complexity (UK: M = 3.20, SD = 1.16; M = 3.30, SD = 1.06), with higher ratings indicating greater familiarity.


The four tasks used to elicit teachers’ perspectives on task difficulty were all adapted from tasks included in the textbook New Cutting Edge Pre-Intermediate (Cunningham & Moor, 2005). Our rationale for selecting pedagogic tasks from a commercial textbook was to increase the ecological validity of the research. In many contexts, teachers often need to adapt textbook materials to fit the needs and ability level of their students.

We selected two decision-making tasks and two information-gap activities (see Appendix for tasks). As part of one of the decision-making tasks, Jungle Trip, students were asked to decide which 12 items they would take on a jungle trip, where they have to survive for 72 hours without help. The task input included the task instructions and a photo depicting the set of objects from which learners can choose. The task instructions broke down the task into two phases: first, each learner was asked to explain what items they would take individually; then, students were directed to agree on the best list of items.

The other decision-making task, Facelift, involved learners in deciding in groups what improvements to make to a cafe using a limited budget. Students were encouraged, in particular, to consider how to improve the bar area or equipment, decoration, and furniture. In addition to the task instructions, learners were provided with a picture of how the cafe looked and a plan of the cafe area to assist with planning.

The third task, New Zealand, was an information-gap activity, requiring pair work. Both members of the pair were given a map of New Zealand, each containing different pieces of information. The students’ task was to find out from their partner where a given list of places were located on the map, and why they were important landmarks. Thus, the task input consisted of the map with labels and the task instructions.

The last task that teachers were asked to examine and modify was a traditional Map task. Students, working in pairs, were instructed to ask for and give directions based on a map. Both partners were told where they were on the map, and they were provided with a list of places to which they needed to ask for directions. The two members of the pair had access to different map versions. Each map clearly indicated the places to which the student needed to give directions, but the names of the locations to which the learner had to ask directions were missing. Thus, the task input had two main components: the instructions and the map.

When selecting these tasks, we had several considerations in mind. We decided to use two task types, decision making and information gap, rather than a single type, in order to enable us to capture a fuller range of task factors. For example, we anticipated that the decision-making tasks would elicit more reasoning-related comments from the teachers than the information-gap activities. Given the eye-tracking component of the experiment, we also considered the distribution of textual and pictorial input incorporated in the tasks. We opted to use tasks that contained clearly delineated areas of textual and pictorial input to facilitate subsequent analysis (see below). Finally, we decided to use materials from the same textbook to control for, at least to some degree, the language ability needed to complete the tasks. Also, in this way we were able to eliminate confounds resulting from factors such as differences in font type and style of layout.


The teachers completed the experiment in one individual session, which took between 60 and 90 minutes. First, we obtained informed consent, then administered a background questionnaire. After that, the eye-tracking system was calibrated. The eye movements of the UK participants were captured by means of a mobile Tobii X2-30 eye tracker with a temporal resolution of 30 Hz. The eye tracker was mounted to a Samsung laptop with a 17-inch screen. The U.S. participants were recorded with a Tobii TX300 integrated eye-tracking system using a sampling rate of 300 Hz and a 23-inch screen. The participants were seated facing the eye-tracker approximately 60 cm from the center of the screen, and their eyes were calibrated using a 9-point calibration grid. The materials were presented with Tobii Studio 3.0.9 software (Tobii Technology, n.d.).

Once the eye-tracking system was calibrated, we familiarized participants with the instructions and procedures in a practice phase. First, the participant read the general instructions, followed by instructions about how to think aloud. They were asked to consider the following three questions while examining the experimental tasks:

  • What level would this task be appropriate for? Why?

  • How would you modify this task for more advanced learners?

  • How would you modify this task for less advanced learners?

Next, the participants practiced thinking aloud while considering a sample task. In the practice phase, we encouraged the participants to raise any questions they had with regard to the procedures, but very few asked for clarifications. Finally, the participants moved on to the actual experiment and considered, while thinking aloud, the four pedagogical tasks guided by the three questions provided. They completed each task at their own pace. In each setting, the researcher stayed in the same room in case any technical problems arose, and, in a very small number of cases when it was needed, reminded participants to think aloud. Otherwise, the researcher sat at a discrete distance and worked on their computer to try to avoid potential distortions in the think-aloud data caused by the researcher's presence.

Data Analyses

Think-Aloud Data

The analysis of the think-aloud data included five phases. First, the data were transcribed by a research assistant. Second, the same research assistant reviewed all the think-aloud comments and identified emergent categories by annotating the data. The first author also coded 20% of the data set following the same procedure. Percentage agreement between the first author and research assistant for category identification was found to be high across all four tasks (Jungle Trip = .85; Facelift = 91; New Zealand = .94; Map = .96). Third, the first author grouped the annotations to form macro-categories through establishing patterns in the data. In the fourth step, the resulting categorization was double-checked by the first author. Finally, a frequency count of all the annotations was computed for each task by summing up the annotations falling into a particular category.

Eye-Tracking Data

The eye-movement data were analyzed utilizing Tobii Studio 3.0.9 (Tobii Technology, n.d.). For each task, the data were segmented into three parts, according to whether the teachers were talking about (a) the proficiency level appropriate for the task, (b) modifications that could make the task more difficult, or (c) modifications that could make the task less difficult. In all cases, the teachers’ think-aloud comments clearly indicated which of the three questions they were considering. In a few cases, the teachers only addressed two out of the three questions, resulting in a smaller number of segments. The areas of interest (AOIs) were specified as those parts of the slide that included the task instructions (AOI instructions) versus those parts that provided students with the pictorial input (AOI pictorial) (see Appendix). Next, raw fixation durations and counts were exported for each AOI. The raw data then were corrected for time-on-segment, in other words, we divided the total duration and number of fixations by the amount of time teachers spent on each segment (i.e., one of the three questions). 2 When the task instructions or pictorial input consisted of more than one area of interest, the data for these were combined for the purposes of further analyses (e.g., data for cafe plan and picture of cafe were merged, as they together constituted the pictorial input for the Facelift task).


Think-Aloud Data

This section presents a list of the task factors that emerged from the content analysis of the think-aloud comments. Six macro-categories were identified across the four tasks: conceptual demands, linguistic demands, interactional demands, procedural demands, modality, and task outcome. Some of these were further broken down to subcategories. Table 1 provides examples for each macro-category and some of the subcategories, according to the three questions posed. The rest of this section gives the frequency counts for each macro-category and subcategory by the three questions for the four tasks.

Table 1. Examples for Macro-Categories of Task Dimensions

Jungle Trip Task

For the Jungle Trip task, the content analysis generated 84 annotations altogether. From the comments the teachers made when assessing the proficiency level appropriate for the task, 23 annotations emerged. As shown in Table 2, the teachers most frequently mentioned linguistic demands as determinants of task difficulty (n = 14). Within this category, the teachers listed lexis most often (n = 8), and a smaller number of teachers also referred to grammar (n = 3). Conceptual demands emerged as the second most frequent category from the think-aloud comments (n = 8). In particular, the teachers reflected on the extent of reasoning required by the task (n = 5) and the amount of background knowledge assumed (n = 3). Finally, one teacher also took a procedural factor into account: whether planning time was made available.

Table 2. Factors Mentioned by Teachers When Assessing and Manipulating the Difficulty of the Jungle Trip Task

Note. Value in general categories may be higher than the sum of annotations in the subcategories, as some teachers only mentioned the more general category.

a N and % refer to the number and percentage of annotations.

A total of 33 annotations concerned the modifications that teachers would make to increase task difficulty. The large majority of the annotations considered ways to enhance the conceptual demands of the task (n = 23). More than half of the teachers suggested manipulations involving the items to take on the jungle trip (n = 14), and a considerable number of the teachers proposed increasing conceptual demands by requiring learners to reason (n = 9). The second most frequently suggested type of modification included comments related to the task outcome (n = 5). Two additional categories emerged from the content analysis: interactional (n = 3) and procedurals demands (n = 2).

When coding the modifications recommended by the teachers to decrease task difficulty, 28 annotations were made. These were grouped into two main categories: linguistic (n = 17) and conceptual demands (n = 11). Most teachers suggested lowering linguistic demands by providing key lexis to students (n = 15). Two teachers additionally proposed decreasing the linguistic complexity of the instructions. Moving onto conceptual demands, several teachers mentioned provision of more extensive background information (n = 6) as a possible means to decrease task demands. The rest of the think-aloud comments were concerned with how manipulating the items might lower cognitive load (n = 5).

Facelift Task

The coding of the think-aloud comments for the Facelift task resulted in 66 comments. Of these, 17 annotations were concerned with the suitability of the task for a particular proficiency level. As Table 3 demonstrates, teachers most often cited linguistic demands when contemplating the proficiency required to carry out the task (n = 12). The majority of the comments were concerned with lexis (n = 5), followed by grammar (n = 2) and genre type elicited (n = 2). Factors related to the conceptual demands posed by the task also featured in a considerable number of think-aloud comments (n = 5). In particular, teachers listed the extent of reasoning needed to carry out the task (n = 2), the complexity of the pictorial input (n = 1), the number of elements to consider (n = 1), and the amount of background knowledge assumed (n = 1) as factors determining their judgment about task difficulty.

Table 3. Factors Mentioned by Teachers When Assessing and Manipulating the Difficulty of the Facelift Task

Note. Value in general categories may be higher than the sum of comments in the subcategories, as some teachers only mentioned the more general category.

a N and % refer to the number and percentage of annotations.

In analyzing the teachers’ think-aloud comments about how to increase task difficulty, 25 annotations emerged. The majority of the teachers suggested increasing the conceptual complexity of the task through either increasing reasoning demands (n = 12) or altering the pictorial input (n = 5). Comments related to the task outcome constituted the second most frequently cited category (n = 6). Finally, two teachers proposed asking students to work in pairs rather than groups (n = 2).

When the think-aloud comments about decreasing task difficulty were coded for the Facelift task, 24 annotations were created. The majority of the annotations referred to linguistic demands (n = 14). Most teachers suggested that, in order to reduce task difficulty, learners should receive support with lexis (n = 11), possibly as part of a pretask phase. Two teachers also recommended providing students with access to grammatical constructions that are relevant to the task. Conceptual demands-related comments also emerged from the think-aloud data, although less frequently (n = 6). Teachers mentioned manipulating the pictorial input (n = 2) and allowing students more freedom to select what areas they would like to renovate (n = 2). Finally, several teachers noted that the task would probably pose less challenge if learners engaged in pair or group work or worked together as a class (n = 4).

New Zealand Task

The think-aloud data for the New Zealand task yielded 63 annotations (see Table 4); 21 of these were derived from the think-aloud comments recorded while teachers were considering the proficiency level needed to carry out the task. The largest category, including half of the annotations, made reference to linguistic demands, such as the complexity of lexis (n = 6), grammar (n = 3), and sentence structure (n = 1). One teacher also mentioned task genre as a factor determining task difficulty. Eight annotations were concerned with conceptual demands, making this category the second most frequent. When considering cognitive task complexity, most teachers assessed the complexity of the map (n = 6), whereas a smaller number of teachers took into account the level of background knowledge assumed (n = 2). Two teachers also referred to interactional demands as a variable potentially contributing to task difficulty.

Table 4. Factors Mentioned by Teachers When Assessing and Manipulating the Difficulty of the New Zealand Task

Note. Value in general categories may be higher than the sum of annotations in the subcategories, as some teachers only mentioned the more general category.

a N and % refer to the number and percentage of annotations.

When coding the teachers’ think-aloud comments about ways to increase the difficulty of the New Zealand task, 17 annotations emerged. Factors related to conceptual task demands appeared in the teachers’ think-alouds most frequently (n = 8). Several teachers proposed making the pictorial input more complex by including more information to share (n = 5). Besides manipulating the pictorial input, teachers also suggested requiring learners to reason more (n = 2) and presenting them with an unknown map (n = 1). Another category emerging from the comments related to the task outcome. Some teachers thought that the task could be made more complex if learners were additionally asked to create a presentation about New Zealand, prepare an itinerary for travel, or plan a trip. Altering the interactional (n = 2) and procedural demands (n = 2) were, too, mentioned by a small number of teachers. In particular, they recommended group instead of pair work as well as removing the instructions. Finally, two teachers proposed that the introduction of a writing component could make the task more difficult.

Turning to suggested manipulations to decrease task difficulty, the data set generated 25 annotations. Most think-aloud comments referred to conceptual demands (n = 15), proposing to decrease task difficulty either by increasing learners’ familiarity with the task content (n = 10) or manipulating the pictorial input (n = 5). The category that emerged with the second most annotations was linguistic demands (n = 5). Teachers suggested adding a pretask activity, during which key grammar (n = 4) and lexis (n = 3) would be provided. Several think-aloud comments mentioned procedural factors (n = 4), and one teacher recommended utilizing group instead of pair work to ease interactional task demands

Map Task

For the Map task, 59 annotations emerged from the analysis of the think-aloud comments. Table 5, shows that 20 annotations came from the stage when teachers were contemplating the proficiency level required for the task. Conceptual complexity was found to be the most frequently mentioned factor, accounting for more than half of the total annotations (n = 11). Among cognitive factors, teachers most often considered the complexity of the map (n = 5). Additional cognitive factors referred to were the complexity of the directions that learners were expected to give (n = 1) and the extent of learners’ familiarity with the task type (n = 1). The second most frequently cited category consisted of linguistic demands (n = 5), more precisely, the complexity of the lexis needed to complete the task. A small number of teachers also took into account procedural (n = 2) and interactional (n = 2) task demands when judging task difficulty.

Table 5. Factors Mentioned by Teachers When Assessing and Manipulating the Difficulty of the Map Task

Note. Value in general categories may be higher than the sum of annotations in the subcategories, as some teachers only mentioned the more general category.

a N and % refer to the number and percentage of annotations.

In analyzing the teachers’ think-aloud comments in response to the question what modifications they would make to increase task difficulty, 28 annotations were generated. The majority of the comments suggested enhancing conceptual complexity (n = 17). Most teachers argued that this could be achieved by manipulating the map (n = 12) or increasing reasoning demands (n = 5). Besides enhancing conceptual demands, several teachers thought that task difficulty would rise if the task materials incorporated more complex lexis (n = 4) and required participants to interact on their mobile phones as opposed to face-to-face (n = 4). Finally, two teachers suggested adding another task outcome, and one teacher proposed increasing the social distance among participants.

Based on the teachers’ think-aloud comments about how to decrease the difficulty of the Map task, 11 annotations were created. Lowering the conceptual demands of the task emerged by far as the most frequently mentioned proposal (n = 7). The specific comments related to conceptual complexity were parallel to the teachers’ recommendations about how to increase task difficulty. Four teachers suggested modifications to the map, and two teachers proposed providing learners with the opportunity to practice direction-giving tasks prior to completing this task version. Making the directions less complex was, too, raised by one teacher as a possible manipulation to ease cognitive demands. The second most often mentioned modification type concerned the task procedures (n = 2). Adding planning time and removing time pressure each were proposed by one teacher. Finally, one teacher contemplated providing assistance with lexis, and another suggested a change to modality in order to lessen the challenge posed by the task.

Eye-Tracking Data

Table 6 summarizes the descriptive statistics for the total duration and number of fixations within our AOI instructions and AOI pictorial for each question across the four tasks. To ease interpretation, we also calculated a ratio of fixation durations and counts for each segment, by dividing the fixation durations and counts for AOI instructions by those for AOI pictorial. The resulting index captures how long and how often teachers gazed on the instructions as compared to the pictorial input. Thus, higher values indicate greater amount and number of eye fixations on the instructions, with indexes higher than 1 associated with longer and more gazes made on the instructions than the pictorial input.

Table 6. Descriptive Statistics for Fixation Durations and Counts for the Four Tasks

As Table 7 shows, for each task, participants fixated proportionately longer and more often on the instructions when assessing the proficiency level required to complete the task, as compared to when they were considering modifications to lessen or increase task difficulty. The AOI instructions to AOI pictorial ratios also demonstrate that, on the Jungle and Map tasks, teachers looked proportionately longer and more frequently at the instructions in the process of contemplating how to increase, as opposed to, how to decrease task difficulty. On the other hand, on the Facelift and New Zealand tasks, similar AOI instructions to AOI pictorial proportions, were observed for both fixation durations and counts regardless of whether teachers thought aloud about enhancing or lowering task difficulty.

Table 7. AOI Instructions to AOI Pictorial Ratios for Fixation Durations and Counts

Note. Higher values indicate greater amount and number of eye fixations on task instructions.


To complement previous hypothesis-testing research, the goal of this study was to explore whether L2 teachers’ introspections while assessing and modifying task difficulty reflect current theoretical views and/or generate new insights about criteria for task grading and sequencing. To address this goal, we asked a group of English L2 teachers to think aloud while judging the proficiency level required to carry out a set of pedagogical tasks and to consider possible task modifications for learners with lower and higher proficiency. We also recorded participants’ eye movements while they were examining the tasks to obtain a fuller picture about the extent to which they took into account various components of task input.

The think-aloud data revealed that the large majority of the factors to which teachers referred when gauging and manipulating task difficulty are included in Skehan's (1998) limited capacity model, Robinson's (2001, 2011) triadic componential framework, or Ellis's (2003) task framework. This is a reassuring finding for task researchers, confirming that these theoretical models, often invoked to guide research on task difficulty, do indeed incorporate a considerable number of the variable types that, according to the teachers’ reflections in this study, may influence task difficulty. It is also worth pointing out, however, that not all the task dimensions that the teachers mentioned feature in all three models. A notable example is linguistic demands, which most of the teachers took into account during the think-alouds. This dimension is included in the limited capacity model and Ellis's taxonomy but not in the triadic componential framework, the most researched model of task complexity. Naturally, the teachers’ focus on linguistic demands might have been an artifact of their previous training or prior experience with commercial language teaching materials, which often follow or at least include a linguistic syllabus.

Another intriguing observation concerns the frequency with which linguistic demands were brought up by teachers in response to the three questions they were asked to consider. In the process of assessing the difficulty of the task, the teachers’ think-aloud comments most often made reference to linguistic demands across the tasks; the only exception to this trend was the Map task. Among linguistic features, lexis emerged as the most frequently mentioned subcategory on all tasks, with the majority of teachers referring to this aspect of linguistic complexity. It is interesting to triangulate this finding with the pattern that, across all four tasks, participants gazed proportionately more and more often on the instructions than the pictorial input at this stage, as compared to when they contemplated ways to decrease or increase task difficulty. A possible explanation for this might be that teachers based their task difficulty judgment, at least partially, on the linguistic complexity of the instructions and the amount of language support inherent in them.

Unlike during the initial stage of task assessment, teachers made no or hardly any reference to linguistics demands when asked to suggest manipulations to increase task difficulty. Most of their think-aloud comments were concerned with ways in which the conceptual demands of the tasks could be enhanced. The majority of teachers thought that this could be achieved via manipulating the pictorial support (e.g., items, maps) included in the task input. The second most often cited proposal was to raise the reasoning demands posed by the tasks, for example, by requiring students to provide explanations for their decisions. These trends are well aligned with the eye-movement data: Teachers fixated proportionately more on the pictorial task input when considering modifications to increase difficulty as compared to when they were judging task difficulty. This pattern only differed for the Facelift task, where most comments also focused on conceptual demands when pondering this task, but increasing reasoning demands was more frequently proposed as a means to enhance task difficulty than to make alterations to the pictorial prompts. The eye-movement data, too, reflect this difference for the Facelift task: Although, as on the other tasks, participants fixated proportionately more and more often on the pictorial input when deliberating about increasing than assessing task difficulty, this difference for this task was less pronounced. Possibly, this discrepancy was due to the fact the pictorial prompt included in the Facelift task was less elaborate than the images in the other tasks.

The think-aloud comments about modifications to decrease task difficulty paint a more diverse picture. In proposing factors to lower task demands, teachers mentioned both conceptual and linguistic factors (among others). The distribution of these two categories, however, differed across tasks. Conceptual demands appeared more often in teachers’ think-aloud comments when studying the New Zealand and Map tasks, whereas linguistic demands were considered with greater frequency by teachers when deciding on how to lower the difficulty of the Jungle Trip and Facelift tasks. This disparity might have resulted from the differential linguistic demands posed by decision-making and information-gap tasks. Teachers might have perceived the decision-making tasks (Jungle Trip and Facelift) as requiring more creative and, thus, more linguistically complex language use, resulting in an increased need for language support at lower levels of proficiency. For this question, the eye-movement data are not entirely aligned with the think-aloud comments. Nevertheless, they capture the fact that teachers considered the pictorial input to the least degree when reflecting on ways to increase the difficulty of the Facelift task.

It is interesting to reflect on the distribution of the linguistic and conceptual demands-related comments with respect to the limited capacity model (Skehan, 1998) and the cognition hypothesis, a model associated with the triadic componential framework (Robinson, 2001, 2011). While the limited capacity model proposes that task sequencing decisions should be based on both linguistic and conceptual task demands, the cognition hypothesis calls for exclusively relying on cognitive task complexity when grading and sequencing tasks. Based on the current data set, it appears that teachers’ think-aloud comments about decreasing task difficulty are closer to the limited capacity model's conceptualization of task difficulty, as teachers made reference to both linguistic and cognitive factors. The cognition hypothesis, however, seems to be more well aligned with the think-aloud comments addressing the question of how to increase task difficulty, as they predominantly suggested enhancing task demands through manipulating cognitive factors. It is worth noting that Skehan (2015), drawing on Levelt's (1989) model of speech production, reached a similar conclusion, suggesting that the cognition hypothesis might be a more suitable framework for describing task effects at higher proficiency levels.

Now let us turn to a language-related suggestion for modification that has consistently occurred in the think-aloud comments but has been the object of relatively little empirical research: Teachers often proposed introducing lexis in the pretask phase in order to ease subsequent task demands. Reflecting this idea, Newton (2001) argued that, indeed, targeting key vocabulary in the pretask phase may enable learners to allocate more attention to meaning during the actual task performance since potential problems with encoding and decoding lexis would be dealt with prior to task performance. Ellis (2003), however, warned that preteaching vocabulary might prompt learners to view the task as a platform for practicing vocabulary rather than an act of communication. It would be worthwhile to investigate in future research what the actual impact of preteaching vocabulary would be on the cognitive processes in which learners engage.

Another fruitful avenue for future task-based research would involve examining the effects of altering the interactional set-up of tasks, for example, by changing pair work into group work or vice versa. Although modifying interactional demands was frequently considered by teachers as a way to influence task demands, task-based research on this factor is sparse to date.


Last but not least, let us turn to the limitations of this research. This study included a relatively small number of teachers who had diverse language teaching experience but were familiar with TBLT to some extent. It would be worthwhile to investigate whether the findings here would differ depending on the amount of language teaching and specific TBLT experience teachers have. Another limitation concerns the limited number of task types the study included; future research is needed to explore whether the results found here would transfer to other task types. Finally, an important direction for future research would be to triangulate teachers’ perspectives about task difficulty with those of learners. Although a few studies have begun to explore learner perceptions of task difficulty via introspective methods (e.g., Kim, Payant, & Pearson, 2015), more research of this kind is needed to inform theoretical and empirical work about task grading and sequencing, especially given the potentially important implications of this line of research for practice.


2. Although the eye trackers at the two data collection sites differed in screen size, pixels were not affected on the screen. Thus, no scaling was deemed necessary.



Baralt, M., Gilabert, R., & Robinson, P. (Eds.). (2014). Task sequencing and instructed second language learning. London, UK: Bloomsbury.
Bygate, M. (2000). Introduction to special issue: Tasks in language pedagogy. Language Teaching Research, 4, 185192.
Cunningham, S., & Moor, P. (2005). New cutting edge pre-intermediate. London, UK: Longman.
Dewey, J. (1975). Interest and effort in education. Carbondale, IL: Southern Illinois University Press, Arcturus Books. (Original work published 1913)
Ellis, R. (2003). Task-based language teaching and learning. Oxford, UK: Oxford University Press.
Kim, Y., Payant, C., & Pearson, P. (2015). The intersection of task-based interaction, task complexity, and working memory: L2 question development through recasts in a laboratory setting. Studies in Second Language Acquisition, 37, 549581.
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Long, M. H. (1985). A role for instruction in second language acquisition: Task-based language teaching. In Hyltenstam, K. & Pienemann, M. (Eds.), Modeling and assessing second language development (pp. 7799). Clevedon, UK: Multilingual Matters.
Long, M. H. (1991). Focus on form: A design feature in language teaching methodology. In de Bot, K., Coste, D., Ginsberg, R., & Kramsch, C. (Eds.), Foreign language research in cross-cultural perspectives (pp. 3952). Amsterdam, The Netherlands: John Benjamins.
Long, M. H. (Ed.). (2005). Second language needs analysis. Cambridge, UK: Cambridge University Press.
Long, M. H. (2015). Second language acquisition and task-based language teaching. Malden, MA: Wiley-Blackwell.
Long, M. H., & Norris, J. (2015). An international collaborative research network (CRN) on task complexity. Colloquium at the 6th Biennial International Conference on Task-Based Language Teaching, Leuven, Belgium.
Newton, J. (2001). Options for vocabulary leaning through communication tasks. ELT Journal, 55, 3037.
Norris, J. M. (2010, September). Understanding instructed SLA: Constructs, contexts, and consequences. Plenary address delivered at the Annual Conference of the European Second Language Association (EUROSLA), Reggio Emilia, Italy.
Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes. Applied Linguistics, 35, 8792.
Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: A triadic framework for investigating task influences on SLA. In Robinson, P. (Ed.), Cognition and second language instruction (pp. 287318). New York, NY: Cambridge University Press.
Robinson, P. (2011). Second language task complexity, the cognition hypothesis, language learning, and performance. In Robinson, P. (Ed.), Second language task complexity: Researching the cognition hypothesis of second language learning and performance (pp. 337). Amsterdam, The Netherlands: John Benjamins.
Samuda, V., & Bygate, M. (2008). Tasks in second language learning. Basingstoke, UK: Palgrave Macmillan.
Skehan, P. (1998). A cognitive approach to language learning. Oxford, UK: Oxford University Press.
Skehan, P. (2015). Limited attention capacity and cognition: Two hypotheses regarding second language performance on tasks. In Bygate, M. (Ed.), Domains and directions in the development of TBLT: A decade of plenaries from the international conference (pp. 123156). Amsterdam, The Netherlands: John Benjamins.
Tobii Technology. (n.d.). Tobii Studio 3.0.9 [Eye-tracking software]. Stockholm, Sweden.
Van den Branden, K. (Ed.) (2006). Task-based language teaching: from theory to practice. Cambridge, UK: Cambridge University Press.