The time–frequency representation (TFR) is the initial stage of
analysis in sound/music analysis–resynthesis (A–R)
systems. Given a time-domain waveform, the TFR makes temporal and spectral
detail available to the remainder of the analysis, so that the component
features may be extracted. The resulting ‘feature set’ must
represent the sound as completely as the original time-domain signal, if the
A–R system is to be capable of effective transformation and good
synthesis sound quality. Therefore the system as a whole is reliant upon
the TFR to make the sound components detectable, separable and
measurable. Yet the standard TFR to-date is the short-time Fourier transform
(STFT), of which the shortcomings, in terms of resolution, are well
recognised. The purpose of this paper is to demonstrate the importance of
the TFR to system function and system design. Poor feature extraction is
shown to result from the use of inappropriate TFRs, whose underlying
assumptions and expectations do not match those of the system. Existing
models are used as case studies, with examples of performance for different
sound types. A philosophy for A–R system design that includes TFR
design is presented and a methodology for implementing it is proposed.