We have developed an English pronunciation learning system which estimates the intelligibility of
Japanese learners' speech and ranks their errors from the viewpoint of improving their intelligibility
to native speakers. Error diagnosis is particularly important in self-study since students tend to
spend time on aspects of pronunciation that do not noticeably affect intelligibility. As a preliminary
experiment, the speech of seven Japanese students was scored from 1 (hardly intelligible) to 5 (perfectly
intelligible) by a linguistic expert. We also computed their error rates for each skill. We found
that each intelligibility level is characterized by its distribution of error rates. Thus, we modeled
each intelligibility level in accordance with its error rate. Error priority was calculated by comparing
students' error rate distributions with that of the corresponding model for each intelligibility
level. As non-native speech is acoustically broader than the speech of native speakers, we developed
an acoustic model to perform automatic error detection using speech data obtained from Japanese
students. As for supra-segmental error detection, we categorized errors frequently made by Japanese
students and developed a separate acoustic model for that type of error detection. Pronunciation
learning using this system involves two phases. In the first phase, students experience virtual conversation
through video clips. They receive an error profile based on pronunciation errors detected
during the conversation. Using the profile, students are able to grasp characteristic tendencies in
their pronunciation errors which in effect lower their intelligibility. In the second phase, students
practise correcting their individual errors using words and short phrases. They then receive information
regarding the errors detected during this round of practice and instructions for correcting the
errors. We have begun using this system in a CALL class at Kyoto University. We have evaluated
system performance through the use of questionnaires and analysis of speech data logged in the
server, and will present our findings in this paper.