2 - Basic audio processing
Published online by Cambridge University Press: 05 June 2016
Summary
Most speech and audio researchers use MATLAB as a preferred tool for audio processing, although many of us will make use of other specialised tools from time to time, such as sox for command line audio processing(particularly when there are a large number of files to convert or process, something it can do with a single command line option), and the sound capture and editing tool audacity which can record, edit, manipulate, convert and play back numerous types of audio file. In fact both of these programs are extremely capable open source tools, having far more options than could be described here. However, while very useful, neither tool can replace the abilities of MATLAB to easily develop scripts that make use of hundreds of built-in functions and operators, and can plot or visualise speech and other sounds in a multitude of ways.
Recorded speech or other sounds are stored within MATLAB (as well as in many other computer-based tools) as a vector of samples, with each individual value being a double precision floating point number. A sampled sound can be completely specified by the vector of these numbers as long as one other item of information is known: the sample rate at which the data was recorded. To replay the sampled sound, it is only necessary to sequentially output a voltage proportional to the stored vector information, with a gap between samples equivalent to the inverse of the sample rate.
General audio programs and tools store audio information similarly, except that they tend to use fixed point numbers rather than floating point, which can reduce the storage requirement by a factor of four at the expense of very little degradation – assuming the system is correctly designed. In particular, a consideration of overflow and underflow effects is usually needed when designing a system that uses fixed point storage for audio, whereas in floating point-based tools such as MATLAB this is rarely a concern in practice.
Any operation that MATLAB can perform on a general vector can, in theory, be performed on stored audio. In fact, this is how we typically perform audio processing within MATLAB, and the audio vector can be loaded and saved in much the same way as any other MATLAB variable. Likewise it can be processed, added, plotted, inverted, transformed and so on.
- Type
- Chapter
- Information
- Speech and Audio ProcessingA MATLAB-based Approach, pp. 9 - 53Publisher: Cambridge University PressPrint publication year: 2016
- 1
- Cited by