Speech and Audio Processing: A MATLAB®</I>-based Approach

Ian Vince McLoughlin

doi:10.1017/CBO9781316084205

References

[1] M., Frigo and S.G., Johnson. The design and implementation of FFTW3.Proc. IEEE, 93(2):216–231, February 2005. doi: 10.1109/JPROC.2004.840301.

[2] S. W, Smith. Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes, 2000. www.dspguide.com.

[3] J.W., Gibbs. Fourier series. Nature, 59:606, 1899.

[4] R.W., Schaefer and L. R., Rabiner. Digital representation of speech signals. Proc. IEEE, 63(4):662–677, 1975.

[5] B. P., Bogert, M.J.R., Healy, and J.W, Tukey. The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovarience, cross-cepstrum and saphe cracking. In M., Rosenblatt, editor, Proc. Symposium on Time-Series Analysis, pages 209–243. Wiley, 1963.

[6] D.G., Childers, D. P., Skinner, and R.C., Kemerait. The cepstrum: a guide to processing.Proc. IEEE, 65(10):1428–1443, October 1977.

[7] F., Zheng, G., Zhang, and Z., Song. Comparison of different implementations of MFCC.J. Computer Science and Technology, 16(6):582–589, September 2001.

[8] J., Barkera and M., Cooke. Is the sine-wave speech cocktail party worth attending?Speech Communication, 27(3–4):159–174, April 1999.

[9] M.R., Schroeder, B. S., Atal, and J. L., Hall. Optimizing digital speech coders by exploiting masking properties of the human ear.J. Acoustical Society of America, 66(6):1647–1652, 1979.

[10] I., Witten. Principles of Computer Speech. Academic Press, 1982.

[11] H. R., Sharifzadeh, I.V., McLoughlin, and M. J., Russell. A comprehensive vowel space for whispered speech.Journal of Voice, 26(2):e49–e56, 2012.

[12] B. C. J., Moore. An Introduction to the Psychology of Hearing. Academic Press, 1992.

[13] I. B., Thomas. The influence of first and second formants on the intelligibility of clipped speech.J. Acoustical Society of America, 16(2):182–185, 1968.

[14] J., Pickett. The Sounds of Speech Communication. Allyn and Bacon, 1980.

[15] Z., Li, E.C., Tan, I., McLoughlin, and T. T., Teo. Proposal of standards for intelligibility tests of Chinese speech.IEE Proc. Vision Image and Signal Processing, 147(3):254–260, June 2000.

[16] F. L., Chong, I., McLoughlin, and K., Pawlikowski. A methodology for improving PESQ accuracy for Chinese speech. In Proc. IEEE TENCON, Melbourne, November 2005.

[17] K., Kryter. The Handbook of Hearing and the Effects of Noise. Academic Press, 1994.

[18] L. L., Beranek. The design of speech communications systems.Proc. IRE, 35(9):880–890, September 1947.

[19] W., Tempest, editor. The Noise Handbook. Academic Press, 1985. 370

[20] M., Mourjopoulos, J., Tsoukalas, and D., Paraskevas. Speech enhancement using psychoacoustic criteria. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 359–362, 1991.

[21] F., White. Our Acoustic Environment. John Wiley & Sons, 1976.

[22] P. J., Blamey, R.C., Dowell, and G.M., Clark. Acoustic parameters measured by a formant estimating speech processor for a multiple-channel cochlear implant.J. Acoustical Society of America, 82(1):38–47, 1987.

[23] I.V., McLoughlin, Y., Xu, and Y., Song. Tone confusion in spoken and whispered Mandarin Chinese. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 313–316. IEEE, 2014.

[24] I. B., Thomas. Perceived pitch of whispered vowels.J. Acoustical Society of America, 46: 468–470, 1969.

[25] Y., Swerdlin, J., Smith, and J., Wolfe. The effect of whisper and creak vocal mechanisms on vocal tract resonances.J. Acoustical Society of America, 127(4):2590–2598, 2010.

[26] P. C., Loizou. Speech Enhancement: Theory and Practice. CRC Press, 2013.

[27] N., Kitawaki, H., Nagabuchi, and K., Itoh. Objective quality evaluation for lowbit- rate speech coding systems.IEEE J. Selected Areas in Communications, 6(2): 242–248, 1988.

[28] Y., Hu and P.C, Loizou. Evaluation of objective quality measures for speech enhancement.IEEE Trans. Audio, Speech, and Language Processing, 16(1):229–238, 2008.

[29] A. D., Sharpley. Dynastat webpages, 1996 to 2006. www.dynastat.com/SpeechIntelligibility.htm.

[30] S. F., Boll. Suppression of acoustic noise in speech using spectral subtraction.IEEE Trans. Acoustics, Speech and Signal Processing, 27(2):113–120, 1979.

[31] R. E. P., Dowling and L. F., Turner. Modelling the detectability of changes in auditory signals.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pages 133–136, 1993.

[32] J. I., Alcantera, G. J., Dooley, P. J., Blamey, and P.M., Seligman. Preliminary evaluation of a formant enhancement algorithm on the perception of speech in noise for normally hearing listeners.J. Audiology, 33(1):15–24, 1994.

[33] J. G. van, Velden and G. F., Smoorenburg. Vowel recognition in noise for male, female and child voices.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 796–799, 1995.

[34] G. A., Miller, G. A., Heise, and W., Lichten. The intelligibility of speech as a function of the context of the test materials.Experimental Psychology, 41:329–335, 1951.

[35] W. G., Sears. Anatomy and Physiology for Nurses and Students of Human Biology. Arnold, 4th edition, 1967.

[36] J., Simner, C., Cuskley, and S., Kirby. What sound does that taste? Cross-modal mappings across gustation and audition.Perception, 39(4):553, 2010.

[37] R., Duncan-Luce. Sound and Hearing: A Conceptual Introduction. Lawrence Erlbaum and Associates, 1993.

[38] W. F., Ganong. Review of Medical Physiology. Lange Medical Publications, 9th edition, 1979.

[39] H., Fletcher and W.A., Munson. Loudness, its definition, measurement and calculation. Bell System Technical Journal, 12(4):377–430, 1933.

[40] K., Kryter. The Effects of Noise on Man. Academic Press, 2nd edition, 1985.

[41] R., Plomp. Detectability threshold for combination tones.J. Acoustical Society of America, 37(6):1110–1123, 1965.

[42] K., Ashihara. Combination tone: absent but audible component.Acoustical Science and Technology, 27(6):332, 2006.

[43] J. C. R., Licklider. Auditory Feature Analysis. Academic Press, 1956.

[44] Y. M., Cheng and D., O'shaughnessy. Speech enhancement based conceptually on auditory evidence.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 961–963, 1991.

[45] N., Virag. Speech enhancement based on masking properties of the auditory system.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 796–799, 1995.

[46] Y., Gao, T., Huang, and J. P., Haton. Central auditory model for spectral processing.Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 704–707, 1993.

[47] I.V., McLoughlin and Z.-P., Xie. Speech playback geometry for smart homes. In Consumer Electronics (ISCE 2014), 18th IEEE Int. Symp. on, pages 1–2. IEEE, 2014.

[48] C. R., Darwin and R. B., Gardner. Mistuning a harmonic of a vowel: grouping and phase effects on vowel quality.J. Acoustical Society of America, 79:838–845, 1986.

[49] D., Sen, D. H., Irving, and W.H., Holmes. Use of an auditory model to improve speech coders.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pages 411–415, 1993.

[50] H., Hermansky. Perceptual linear predictive (PLP) analysis of speech.J. Acoustical Society of America, 87(4):1738–1752, April 1990.

[51] S. S., Stevens, J., Volkmann, and E.B., Newman. A scale for the measurement of the psychological magnitude pitch.J. Acoustical Society of America, 8(3):185–190, 1937. doi: http://dx.doi.org/10.1121/1.1915893. http://scitation.aip.org/content/asa/journal/jasa/8/3/10.1121/1.1915893.

[52] D., O'shaughnessy. Speech Communication: Human and Machine. Addison-Wesley, 1987.

[53] G., Fant. Analysis and synthesis of speech processes. In B., Malmberg, editor, Manual of Phonetics, pages 173–177. North-Holland, 1968.

[54] ISO/MPEG–Audio Standard layers. Editorial pages.Sound Studio Magazine, pages 40–41, July 1992.

[55] A., Azirani, R., Jeannes, and G., Faucon. Optimizing speech enhancement by exploiting masking properties of the human ear.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 800–803, 1995.

[56] A. S., Bregman. Auditory Scene Analysis. MIT Press, 1990.

[57] H., Purwins, B., Blankertz, and K., Obermayer. Computing auditory perception.Organised Sound, 5(3):159–171, 2000.

[58] C. M. M., Tio, I.V., McLoughlin, and R.W., Adi. Perceptual audio data concealment and watermarking scheme using direct frequency domain substitution.IEE Proc. Vision, Image & Signal Processing, 149(6):335–340, 2002.

[59] I.V., McLoughlin and R. J., Chance. Method and apparatus for speech enhancement in a speech communications system.PCT international patent (PCT/GB98/01936), July 1998.

[60] Y. M., Cheng and D., O'shaughnessy. Speech enhancement based conceptually on auditory evidence.IEEE Trans. Signal Processing, 39(9):1943–1954, 1991.

[61] N., Jayant, J., Johnston, and R., Safranek. Signal compression based on models of human perception.Proc. IEEE, 81(10):1383–1421, 1993.

[62] D., Sen and W. H., Holmes. Perceptual enhancement of CELP speech coders. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 105–108, 1993.

[63] J., Markel and A., Grey. Linear Prediction of Speech. Springer-Verlag, 1976.

[64] J., Makhoul. Linear prediction: a tutorial review.Proc. IEEE, 63(4):561–580, April 1975.

[65] S., Saito and K., Nakata. Fundamentals of Speech Signal Processing. Academic Press, 1985.

[66] J. L., Kelly and C.C., Lochbaum. Speech synthesis. Proc. Fourth Int. Congress on Acoustics, pages 1–4, September 1962.

[67] B. H., Story, I. R., Titze, and E.A., Hoffman. Vocal tract area functions from magnetic resonance imaging.J. Acoustical Society of America, 100(1):537–554, 1996.

[68] N., Sugamura and N., Favardin. Quantizer design in LSP speech analysis–synthesis.IEEE J. Selected Areas in Communications, 6(2):432–440, February 1988.

[69] S., Saoudi, J., Boucher, and A., Guyader. A new efficient algorithm to compute the LSP parameters for speech coding.Signal Processing, 28(2):201–212, 1995.

[70] T., I. and M. I., T.TIMIT database. A CD-ROM database of phonetically classified recordings of sentences spoken by a number of different male and female speakers, disc 1-1.1, 1990.

[71] N., Sugamura and F., Itakura. Speech analysis and synthesis methods developed at ECL in NTT – from LPC to LSP.Speech Communications, pages 213–229, 1986.

[72] J. S., Collura and T. E., Tremain. Vector quantizer design for the coding of LSF parameters. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 29–32, 1993.

[73] I.V., McLoughlin. LSP parameter interpretation for speech classification. In Proc. 2nd IEEE Int. Conf. on Information, Communications and Signal Processing, December 1999.

[74] I.V., McLoughlin and F., Hui. Adaptive bit allocation for LSP parameter quantization. In Proc. IEEE Asia–Pacific Conf. on Circuits and Systems, paper number 231, December 2000.

[75] Q., Zhao and J., Suzuki. Efficient quantization of LSF by utilising dynamic interpolation. In IEEE Int. Symp. on Circuits and Systems, pages 2629–2632, June 1997.

[76] European Telecommunications Standards Institute. Trans-European trunked radio system (TETRA) standard. 1994.

[77] K. K., Paliwal and B. S., Atal. Efficient vector quantization of LPC parameters at 24 bits per frame. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 661–664, 1991.

[78] D.-I., Chang, S., Ann, and C.W., Lee. A classified split vector quantization of LSF parameters.Signal Processing, 59(3):267–273, June 1997.

[79] R., Laroia, N., Phamdo, and N., Farvardin. Robust and efficient quantization of speech LSP parameters using structured vector quantizers. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 641–644, 1991.

[80] H., Zarrinkoub and P., Mermelstein. Switched prediction and quantization of LSP frequencies. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 757–760, 1996.

[81] C. S., Xydeas and K.K.M., So. A long history quantization approach to scalar and vector quantization of LSP coefficients. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 1–4, 1993.

[82] J.-H., Chen, R.V., Cox, Y.-C., Lin, N., Jayant, and M. J., Melchner. A low-delay CELP coder for the CCITT 16 kb/s speech coding standard.IEEE J. Selected Areas in Communications, 10(5):830–849, June 1992.

[83] B. S., Atal. Predictive coding of speech at low bitrates.IEEE Trans. Communications, 30(4):600–614, 1982.

[84] M. R., Schroeder and B. S., Atal. Code-excited linear prediction CELP: high-quality speech at very low bit rates. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 937–940, 1985.

[85] L. M., Supplee, R. P., Cohn, J. S., Collura, and A.V., McCree. MELP: the new Federal standard at 2400 bps. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, vol. 2, pages 1591–1594, April 1997.

[86] I. A., Gerson and M.A., Jasiuk. Vector sum excited linear prediction (VSELP) speech coding at 8 kbps. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, vol. 1, pages 461–464, April 1990.

[87] L. R., Rabiner and R.W., Schaefer. Digital Processing of Speech Signals. Prentice-Hall, 1978.

[88] I.V., McLoughlin. LSP parameter interpretation for speech classification. In Proc. 6th IEEE Int. Conf. on Electronics, Circuits and Systems, paper number 113, September 1999.

[89] K. K., Paliwal. A study of LSF representation for speaker-dependent and speakerindependent HMM-based speech recognition systems. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, vol. 2, pages 801–804, 1990.

[90] J., Parry, I., Burnett, and J., Chicharo. Linguistic mapping in LSF space for low-bit rate coding.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pages 653–656, March 1999.

[91] L. R., Rabiner, M., Cheng, A., Rosenberg, and C., McGonegal. a comparative performance study of several pitch detection algorithms.IEEE Trans. Acoustics, Speech and Signal Processing, 24(5):399–418, October 1976.

[92] L., Cohen. Time–Frequency Analysis. Prentice-Hall, 1995.

[93] Z. Q., Ding, I.V., McLoughlin, and E. C., Tan. How to track pitch pulse in LP residual – joint time-frequency distribution approach. In Proc. IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, August 2001.

[94] A. G., Krishna and T.V., Sreenivas. Musical instrument recognition: from isolated notes to solo phrases. In Proc. IEEE Int. Conf. on Acoustics Speech and Signal Processing, vol. 4, pages 265–268, 2004.

[95] The British Broadcasting Corporation (BBC). BBC Radio 4: Brett Westwood's guide to garden birdsong, May 2007. www.bbc.co.uk/radio4/science/birdsong.shtml.

[96] A., Harma and P., Somervuo. Classification of the harmonic structure in bird vocalization. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pages 701–704, 2004.

[97] I., McLoughlin, M.-M., Zhang, Z.-P., Xie, Y., Song, and W., Xiao. Robust sound event classification using deep neural networks.IEEE Trans. Audio, Speech, and Language Processing, PP(99), 2015. doi: dx.doi.org/10.1109/TASLP.2015.2389618.

[98] R. F., Lyon. Machine hearing: an emerging field.IEEE Signal Processing Magazine, 42: 1414–1416, 2010.

[99] T. C., Walters. Auditory-based processing of communication sounds. PhD thesis, University of Cambridge, 2011.

[100] A., Kanagasundaram, R., Vogt, D. B., Dean, S., Sridharan, and M.W., Mason. i-vector based speaker recognition on short utterances. In Interspeech 2011, pages 2341–2344, Firenze Fiera, Florence, August 2011. International Speech Communication Association (ISCA). http://eprints.qut.edu.au/46313/.

[101] C., Cortes and V., Vapnik. Support-vector networks. In Machine Learning, 20(3):273–297, 1995.

[102] C.-C., Chang and C.-J., Lin. LIBSVM: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3), article 27, software available at www.csie.ntu.edu.tw/∼cjlin/libsvm.

[103] S., Balakrishnama and A., Ganapathiraju. Linear discriminant analysis – a brief tutorial. Institute for Signal and information Processing, 1998. www.isip.piconepress.com/publications/reports/1998/isip/lda

[104] A., Hyvärinen and E., Oja. Independent component analysis: algorithms and applications.Neural Networks, 13(4):411–430, 2000.

[105] C. M., Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[106] G. E., Hinton, S., Osindero, and Y.-W., Teh. A fast learning algorithm for deep belief nets.Neural Computation, 18(7):1527–1554, 2006.

[107] R. B., Palm. Prediction as a candidate for learning deep hierarchical models of data. Master's thesis, Technical University of Denmark, 2012.

[108] Y., LeCun and Y., Bengio. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks, page 3361, MIT Press, 1995.

[109] J., Bouvrie. Notes on convolutional neural networks. 2006. http://cogprints.org/5869/.

[110] Y., LeCun, L., Bottou, Y., Bengio, and P., Haffner. Gradient-based learning applied to document recognition.Proc. IEEE, 86(11):2278–2324, 1998.

[111] O., Abdel-Hamid, A.-R., Mohamed, H., Jiang, and G., Penn. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE Int. Conf. on, pages 4277–4280. IEEE, 2012.

[112] T. N., Sainath, A.-R., Mohamed, B., Kingsbury, and B., Ramabhadran. Deep convolutional neural networks for LVCSR. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE Int. Conf. on, pages 8614–8618. IEEE, 2013.

[113] H.-M., Zhang, I., McLoughlin, and Y., Song. Robust sound event recognition using convolutional neural networks. In Proc. ICASSP, paper number 2635. IEEE, 2015.

[114] R., Cole, J., Mariani, H., Uszkoreit, G. B., Varile, A., Zaenen, A., Zampolli, and V., Zue, editors. Survey of the State of the Art in Human Language Technology. Cambridge University Press, 2007.

[115] C. A., Kamm, K. M., Yang, C. R., Shamieh, and S., Singhal. Speech recognition issues for directory assistance applications. In Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications IVTTA94, pages 15–19, Kyoto, September 1994.

[116] J. G., Fiscus, J., Ajot, and J. S., Garofolo. The rich transcription. 2007 meeting recognition evaluation. In Multimodal Technologies for Perception of Humans, pages 373–389. Springer, 2008.

[117] H. B., Yu and M.-W., Mak. Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation. In Interspeech 2011, pages 2353–2356, August 2011.

[118] F., Beritelli, S., Casale, and A., Cavallaro. A robust voice activity detector for wireless communications using soft computing.IEEE J. Selected Areas in Communications,, 16(9):1818–1829, 1998.

[119] M.-Y., Hwang and X., Huang. Subphonetic modeling with Markov states-Senone. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, 1992 IEEE Int. Conf. on, vol. 1, pages 33–36. IEEE, 1992.

[120] Y., Song, B., Jiang, Y., Bao, S., Wei, and L.-R., Dai. i-vector representation based on bottleneck features for language identification.Electronics Letters, 49(24):1569–1570, November 2013. doi: 10.1049/el.2013.1721.

[121] B., Jiang, Y., Song, S., Wei, J.-H., Liu, I.V., McLoughlin, and L.-R., Dai. Deep bottleneck features for spoken language identification.PLoS ONE, 9(7):e100795, July 2014. doi: 10. 1371/journal.pone.0100795. http://dx.doi.org/10.1371%2Fjournal.pone.0100795.

[122] B., Jiang, Y., Song, S., Wei, M.-G., Wang, I., McLoughlin, and L.-R., Dai. Performance evaluation of deep bottleneck features for spoken language identification. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 143–147, September 2014. doi: 10.1109/ISCSLP.2014.6936580.

[123] S., Xue, O., Abdel-Hamid, H., Jiang, and L., Dai. Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In Proc. ICASSP, pages 6339–6343, 2014.

[124] C., Kong, S., Xue, J., Gao, W., Guo, L., Dai, and H., Jiang. Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 83–87. IEEE, 2014.

[125] S., Young, G., Evermann, M., Gales, T., Hain, D., Kershaw, X., Liu, G., Moore, J., Odell, D., Ollason, D., Povey et al. The HTK book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge, 1997.

[126] L. R., Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE, 77(2):257–286, 1989.

[127] M., Gales and S., Young. The application of hidden Markov models in speech recognition.Foundations and Trends in Signal Processing, 1(3):195–304, 2008.

[128] L. F., Uebel and P. C., Woodland. An investigation into vocal tract length normalisation. In Eurospeech, 1999.

[129] D., Povey, A., Ghoshal, G., Boulianne, L., Burget, O., Glembek, N., Goel, M., Hannemann, P., Motlicek, Y., Qian, P., Schwarz et al. The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog No.: CFP11SRW-USB.

[130] W., Walker, P., Lamere, P., Kwok, B., Raj, R., Singh, E., Gouvea, P., Wolf, and J., Woelfel. Sphinx-4: a flexible open source framework for speech recognition, 2004. cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf.

[131] A. P. A., Broeders. Forensic speech and audio analysis, forensic linguistics 1998 to 2001 – a review. In Proc. 13th INTERPOL Forensic Science Symposium, pages 51–84, Lyon, October 2001.

[132] A. P. A., Broeders. Forensic speech and audio analysis, forensic linguistics 2001 to 2004 – a review. In Proc. 14th INTERPOL Forensic Science Symposium, pages 171–188, Lyon, 2004.

[133] R., Togneri and D., Pullella. An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2):23–61, 2011. doi: 10.1109/MCAS.2011.941079.

[134] S., Furui. Recent advances in speaker recognition.Pattern Recognition Letters, 18:859–872, 1997.

[135] S., Furui. Speaker-dependent-feature extraction, recognition and processing techniques.Speech Communication, 10: 505–520, 1991.

[136] G., Doddington, W., Liggett, A., Martin, M., Przybocki, and D. A., Reynolds. Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In Proc. 5th Int. Conf. on Spoken Language Processing, vol. 0608, November 1998.

[137] M. A., Zissman and K.M., Berkling. Automatic language identification.Speech Communication, 35: 115–124, 2001.

[138] K., Wu, Y., Song, W., Guo, and L., Dai. Intra-conversation intra-speaker variability compensation for speaker clustering. In Chinese Spoken Language Processing (ISCSLP), 2012 8th Int. Symp. on, pages 330–334. IEEE, 2012.

[139] S., Meignier and T., Merlin. LIUM SpkDiarization: an open source toolkit for diarization. In CMU SPUD Workshop, 2010.

[140] D.-C., Lyu, T. P., Tan, E., Chang, and H., Li. SEAME: a Mandarin-English code-switching speech corpus in South-East Asia. In INTERSPEECH, volume 10, pages 1986–1989, 2010.

[141] M., Edgington. Investigating the limitations of concatenative synthesis. In EUROSPEECH- 1997, pages 593–596, Rhodes, September 1997.

[142] C. K., Ogden. Basic English: A General Introduction with Rules and Grammar. Number 29. K. Paul, Trench, Trubner, 1944.

[143] T., Dutoit. High quality text-to-speech synthesis: an overview.J. Electrical & Electronics Engineering, Australia: Special Issue on Speech Recognition and Synthesis, 17(1):25–36, March 1997.

[144] T. B., Amin, P., Marziliano, and J. S., German. Glottal and vocal tract characteristics of voice impersonators.IEEE Trans. on Multimedia, 16(3):668–678, 2014.

[145] I.V., McLoughlin. The art of public speaking for engineers.IEEE Potentials, 25(3):18–21, 2006.

[146] The University of Edinburgh, The Centre for Speech Technology Research. The festival speech synthesis system, 2004. www.cstr.ed.ac.uk/projects/festival/.

[147] P., Taylor, A., Black, and R., Caley. The architecture of the Festival speech synthesis system. In Third International Workshop on Speech Synthesis, Sydney, November 1998.

[148] Voice Browser Working Group. Speech synthesis markup language (SSML) version 1.0. W3C Recommendation, September 2004.

[149] K. K., Paliwal. On the use of line spectral frequency parameters for speech recognition.Digital Signal Processing, 2: 80–87, 1992.

[150] I.V., McLoughlin and R. J., Chance. LSP-based speech modification for intelligibility enhancement. In 13th Int. Conf. on DSP, Santorini, July 1997.

[151] I.V., McLoughlin and R. J., Chance. LSP analysis and processing for speech coders.IEE Electronics Letters, 33(99):743–744, 1997.

[152] A., Schaub and P., Straub. Spectral sharpening for speech enhancement/noise reduction. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 993–996, 1991.

[153] H., Valbret, E., Moulines, and J. P., Tubach. Voice transformation using PSOLA technique. In IEEE Int. Conf. Acoustics Speech and Signal Proc., pages 145–148, San Francisco, CA, March 1992.

[154] R.W, Morris and M. A., Clements. Reconstruction of speech from whispers.Medical Engineering & Physics, 24(7):515–520, 2002.

[155] H. R., Sharifzadeh, I.V., McLoughlin, and F., Ahmadi. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec.IEEE Trans. Biomedical Engineering, 57:2448–2458, October 2010.

[156] J., Li, I.V., McLoughlin, and Y., Song. Reconstruction of pitch for whisper-to-speech conversion of Chinese. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 206–210. IEEE, 2014.

[157] F., Ahmadi and I.V., McLoughlin. Measuring resonances of the vocal tract using frequency sweeps at the lips. In 2012 5th Int. Symp. on Communications Control and Signal Processing (ISCCSP), 2012.

[158] F., Ahmadi and I., McLoughlin, The use of low-frequency ultrasonics in speech processing. In Signal Proceesing, S., Miron (ed.). InTech, 2010, pp. 503–528.

[159] I.V., McLoughlin. Super-audible voice activity detection.IEEE/ACM Trans. on Audio, Speech, and Language Processing, 22(9):1424–1433, 2014.

[160] F., Ahmadi, I.V., McLoughlin, and H. R., Sharifzadeh. Autoregressive modelling for linear prediction of ultrasonic speech. In INTERSPEECH, pages 1616–1619, 2010.

[161] I.V., McLoughlin and Y., Song. Mouth state detection from low-frequency ultrasonic reflection.Circuits, Systems, and Signal Processing, 34(4):1279–1304, 2015.

[162] R.W., Schafer. What is a Savitzky–Golay filter? [lecture notes].IEEE Signal Processing Magazine, 28(4):111–117, 2011. doi: 10.1109/MSP.2011.941097.

[163] F., Ahmadi, M., Ahmadi, and I.V., McLoughlin. Human mouth state detection using low[163] F., Ahmadi, M., Ahmadi, and I.V., McLoughlin. Human mouth state detection using low frequency ultrasound. In INTERSPEECH, pages 1806–1810, 2013

Speech and Audio Processing

A MATLAB®-based Approach

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

Reviews

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-iv

Contents
pp v-viii

Preface
pp ix-xi

Book features
pp xii-xiv

Acknowledgements
pp xv-xvi

1 - Introduction
pp 1-8

2 - Basic audio processing
pp 9-53

3 - The human voice
pp 54-84

4 - The human auditory system
pp 85-108

5 - Psychoacoustics
pp 109-139

6 - Speech communications
pp 140-194

7 - Audio analysis
pp 195-222

8 - Big data
pp 223-266

9 - Speech recognition
pp 267-313

10 - Advanced topics
pp 314-365

11 - Conclusion
pp 366-369

References
pp 370-378

Index
pp 379-386

Metrics

Full text views

Book summary page views