The present study investigated the potential benefits of extended exposure to captioned videos for second language pronunciation. We tested 90 L2 adult learners of English on speech processing skills (segmentation, speed of lexical access, and sentence processing) and phonological accuracy in perception (ABX discrimination) and production (accentedness ratings) before and after an 8-week treatment consisting of regular exposure to audiovisual materials. Participants were randomly assigned to four experimental conditions involving two viewing modes (captioned or uncaptioned) and two task focus conditions (focus on phonetic form or focus on meaning). Results showed benefits in speech segmentation and speech processing skills irrespective of viewing mode. No significant benefits were found for phonological accuracy in perception. In production, a focus on phonetic form improved pronunciation only in the absence of captions, whereas captioned viewing led to pronunciation gains as long as there was no focus on phonetic form. These findings suggest that pronunciation improvement can take place with the help of captions or, in the absence of captions, when learners’ attention is directed to pronunciation. Cognitive overload might explain why no benefits were obtained when attention was directed to pronunciation in a captioned viewing mode.