This study aims to explore the impact of different captions on second language (L2) learning in a computer-assisted multimedia context. A quasi-experimental design was adopted, and a total of thirty-two eighth graders selected from a junior high school joined the study. They were systematically assigned into four groups based on their proficiency in English; these groups were shown animations with English narration and one of the following types of caption: no captions (M1), Chinese captions (M2), English captions (M3), and Chinese plus English captions (M4). A multimedia English learning program was conducted; the learning content involved two scientific articles presented on a computer. To track the learning process, data on oral repetition were collected after each sentence or scene was played. A post-test evaluation and a semi-structured interview were conducted immediately after viewing. The results show that the effect of different captions in multimedia L2 learning with respect to vocabulary acquisition and reading comprehension depend on students’ L2 proficiency. With English and Chinese + English captions, learners with low proficiency performed better in learning English relative to those who did not have such captions. Students relied on graphics and animation as an important tool for understanding English sentences.