The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on auditory-visual (AV) input has been conducted more extensively in the fields of infant speech development (e.g., Meltzoff & Kuhl 1994), adult monolingual processing (e.g., McGurk & MacDonald 1976; see reference in this timeline), and the treatment of the hearing impaired (e.g., Owens & Blazek 1985) than in L2 speech processing (Hardison 2007). In these fields, the earliest visual input was a human face on which lip movements contributed linguistic information. Subsequent research expanded the types of visual sources to include computer-animated faces or talking heads (e.g., Massaro 1998), hand-arm gestures (Gullberg 2006), and various types of electronic visual displays such as those for pitch (Chun, Hardison & Pennington 2008). Recently, neurophysiological research has shed light on the neural processing of language input, providing another direction researchers have begun to explore in L2 processing (Perani & Abutalebi 2005).