Skip to main content Accessibility help
×
Home

A technical framework for automatic perceptual evaluation of singing quality

  • Chitralekha Gupta (a1) (a2), Haizhou Li (a3) and Ye Wang (a1) (a2)

Abstract

Human experts evaluate singing quality based on many perceptual parameters such as intonation, rhythm, and vibrato, with reference to music theory. We proposed previously the Perceptual Evaluation of Singing Quality (PESnQ) framework that incorporated acoustic features related to these perceptual parameters in combination with the cognitive modeling concept of the telecommunication standard Perceptual Evaluation of Speech Quality to evaluate singing quality. In this study, we present further the study of the PESnQ framework to approximate the human judgments. First, we find that a linear combination of the individual perceptual parameter human scores can predict their overall singing quality judgment. This provides us with a human parametric judgment equation. Next, the prediction of the individual perceptual parameter scores from the PESnQ acoustic features show a high correlation with the respective human scores, which means more meaningful feedback to learners. Finally, we compare the performance of early fusion and late fusion of the acoustic features in predicting the overall human scores. We find that the late fusion method is superior to that of the early fusion method. This work underlines the importance of modeling human perception in automatic singing quality assessment.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A technical framework for automatic perceptual evaluation of singing quality
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A technical framework for automatic perceptual evaluation of singing quality
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A technical framework for automatic perceptual evaluation of singing quality
      Available formats
      ×

Copyright

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

Corresponding author

Corresponding author: Chitralekha Gupta Email: chitralekha@u.nus.edu

References

Hide All
[1]Wapnick, J.; Ekholm, E.: Expert consensus in solo voice performance evaluation. J. of Voice, 11 (4) (1997), 429.
[2]Oates, J.M.; Bain, B.; Davis, P.; Chapman, J.; Kenny, D.: Development of an auditory-perceptual rating instrument for the operatic singing voice. J. of Voice, 20 (1) (2006), 7181.
[3]Chuan, C.; Ming, L.; Jian, L.; Yonghong, Y.: A study on singing performance evaluation criteria for untrained singers, in 9th International Conference on Signal Processing, ICSP 2008, 2008, Beijing, China, 14751478.
[4]Tsai, W.H.; Lee, H.C.: Automatic evaluation of karaoke singing based on pitch, volume, and rhythm features. IEEE Trans. on Audio, Speech, and Language Processing, 20 (4) (2012), 12331243.
[5]Lal, P.: A comparison of singing evaluation algorithms. Interspeech, (2006).
[6]Tanaka, T. (1999) Karaoke Scoring Apparatus Analyzing Singing Voice Relative to Melody Data. U.S. Patent, No. 5889224.
[7]Chang, P.C. (2007) Method and Apparatus for Karaoke Scoring. U.S. Patent, No. 7304229.
[8]Gupta, C.; Li, H.; Wang, Y.: Perceptual Evaluation of Singing Quality, in Proceedings of APSIPA Annual Summit and Conference, Kuala Lumpur, Malaysia, 2017.
[9]Rix, A.W.; Beerends, J.G.; Hollier, M.P.; Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. IEEE ICASSP, 2 (2001), 749752.
[10]Hollier, M.P.; Hawksford, M.O.; Guard, D.R.: Error activity and error entropy as a measure of psychoacoustic significance in the perceptual domain. IEE Proc. Vision, Image and Signal Processing, 141 (3), 203208.
[11]Hastie, R.; Pennington, N.: Cognitive approaches to judgment and decision making. Psychol. Learn. Motiv., 32 (1995), 131, Academic Press.
[12]Hutchins, S.; Sylvain, M.: The Linked Dual Representation model of vocal perception and production. Front. Psychol., 4 (2013), 825.
[13]Hoffman, P.: The paramorphic representation of clinical judgment. Psychol. Bull., 57 (2) (1960), 116131.
[14]Stevenson, M.; Busemeyer, J.; Naylor, J.: Judgment and decision-making theory, In Dunnette, M. D.; Hough, L. M.: Eds., Handbook of industrial and organizational psychology, vol. 1, 2nd ed. Palo Alto, CA: Consulting Psychologists Press, 1990, 283374.
[15]Molina, E.; Barbancho, I.; Gómez, E.; Barbancho, A.M.; Tardón, L.J.: Fundamental frequency alignment vs. note-based melodic similarity for singing voice assessment. IEEE ICASSP, (2013), 744748.
[16]Prasert, P.; Iwano, K.; Furui, S.: An automatic singing voice evaluation method for voice training systems. , (2008), 911912.
[17]Duan, Z.; Fang, H.; Li, B.; Sim, K.C.; Wang, Y.: The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech. IEEE APSIPA, (2013), 19.
[18]Nakano, T.; Goto, M.; Hiraga, Y.: An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. Rn, 12 (2006), 1.
[19]Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11 (1) (2009), 1018.
[20]Lin, C.H.; Lee, Y.S.; Chen, M.Y.; Wang, J.C.: Automatic singing evaluating system based on acoustic features and rhythm. IEEE ICOT, (2014), 165168.

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed