Towards developing general models of usability with PARADISE

MARILYN WALKER; CANDACE KAMM; DIANE LITMAN

doi:10.1017/S1351324900002503

Abstract

The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how to evaluate the predictive power and generalizability of such models. To illustrate the methodology, we develop a number of models for predicting system usability (as measured by user satisfaction), based on the application of PARADISE to experimental data from three different spoken dialogue systems. We then measure the extent to which the models generalize across different systems, different experimental conditions, and different user populations, by testing models trained on a subset of the corpus against a test set of dialogues. The results show that the models generalize well across the three systems, and are thus a first approximation towards a general performance model of system usability.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Hajdinjak, Melita and Mihelič, France 2003. Text, Speech and Dialogue. Vol. 2807, Issue. , p. 400.

Porzel, Robert and Gurevych, Iryna 2003. Modeling and Using Context. Vol. 2680, Issue. , p. 272.

Hajdinjak, M. and Mihelic, F. 2003. Wizard of Oz experiments. Vol. 2, Issue. , p. 112.

Hirschberg, Julia Litman, Diane and Swerts, Marc 2004. Prosodic and other cues to speech recognition failures. Speech Communication, Vol. 43, Issue. 1-2, p. 155.

Higashinaka, Ryuichiro Miyazaki, Noboru Nakano, Mikio and Aikawa, Kiyoaki 2004. Evaluating discourse understanding in spoken dialogue systems. ACM Transactions on Speech and Language Processing, Vol. 1, Issue. , p. 1.

Cerrato, Loredana and Ekeklint, Susanne 2004. From Brows to Trust. Vol. 7, Issue. , p. 101.

Dybkjær, Laila Bernsen, Niels Ole and Minker, Wolfgang 2004. Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, Vol. 43, Issue. 1-2, p. 33.

Kruschwitz, Udo and Al‐Bakour, Hala 2005. Users want more sophisticated search assistants: Results of a task‐based evaluation. Journal of the American Society for Information Science and Technology, Vol. 56, Issue. 13, p. 1377.

Skantze, Gabriel 2005. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, Vol. 45, Issue. 3, p. 325.

Hajdinjak, Melita and Mihelič, France 2006. The PARADISE Evaluation Framework: Issues and Findings. Computational Linguistics, Vol. 32, Issue. 2, p. 263.

Porzel, Robert Gurevych, Iryna and Malaka, Rainer 2006. SmartKom: Foundations of Multimodal Dialogue Systems. p. 269.

Lemon, Oliver Georgila, Kallirroi and Henderson, James 2006. EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION. p. 178.

Andreani, G. Fabbrizio, G. Gilbert, M. Gillick, D. Hakkani-Tur, D. and Lemon, O. 2006. LET'S DISCOH: COLLECTING AN ANNOTATED OPEN CORPUSWITH DIALOGUE ACTS AND REWARD SIGNALS FOR NATURAL LANGUAGE HELPDESKS. p. 218.

Rieser, Verena and Lemon, Oliver 2006. USING LOGISTIC REGRESSION TO INITIALISE REINFORCEMENT-LEARNING-BASED DIALOGUE SYSTEMS. p. 190.

Gelbart, David Bryant, John Stolcke, Andreas Porzel, Robert Baudis, Manja and Morgan, Nelson 2006. SmartKom: Foundations of Multimodal Dialogue Systems. p. 453.

Ammicht, E. Fosler-Lussier, E. and Potamianos, A. 2007. Information Seeking Spoken Dialogue Systems— Part I: Semantics and Pragmatics. IEEE Transactions on Multimedia, Vol. 9, Issue. 3, p. 532.

Williams, Jason D. 2007. A method for evaluating and comparing user simulations: The Cramér-von Mises divergence. p. 508.

Potamianos, A. Fosler-Lussier, E. Ammicht, E. and Perakakis, M. 2007. Information Seeking Spoken Dialogue Systems— Part II: Multimodal Dialogue. IEEE Transactions on Multimedia, Vol. 9, Issue. 3, p. 550.

Möller, Sebastian Smeele, Paula Boland, Heleen and Krebber, Jan 2007. Evaluating spoken dialogue systems according to de-facto standards: A case study. Computer Speech & Language, Vol. 21, Issue. 1, p. 26.

Möller, Sebastian Engelbrecht, Klaus-Peter and Schleicher, Robert 2008. Predicting the quality and usability of spoken dialogue services. Speech Communication, Vol. 50, Issue. 8-9, p. 730.

Download full list

Article contents

Towards developing general models of usability with PARADISE

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Towards developing general models of usability with PARADISE

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests