Experiments using the gating paradigm investigated the effects of auditory–visual (AV) and auditory-only perceptual training on second-language spoken-word identification by Japanese and Korean learners of English. Stimuli were familiar bisyllabic words beginning with /p/, /f/, //, /l/, and /s, t, k/ combined with high, low, and rounded vowels. Results support the priming role of visual cues in AV speech processing. Identification was earlier with visual cues and following training, especially for words beginning with // and /l/, which also showed significant effects of adjacent vowel. For the Japanese, the AV advantage in identifying //- and /l/-initial words was accentuated following training. Findings are discussed within a multimodal episodic model of learning.