It is common in audio (and other kinds of signal processing) to
convert a time-domain signal to a spectrogram via the short-time Fourier transform (STFT)
modify the spectrogram to get
convert the modified spectrogram back to a time-domain signal via the inverse short-time Fourier transform (ISTFT)
An example of such processing is spectral subtraction for noise reduction.
When creating the spectrogram in Step 1, a window is usually applied to the samples in each frame, and the frames overlap. Because of this, it is not completely obvious if and how it is possible to construct a time domain signal from the modified spectrogram.
In fact, depending on the processing that is done, there may not be a time-domain signal whose STFT is exactly the output from Step 2. In this note, we are going to ignore this issue and consider only the simple case where we don’t do anything in Step 2: . That is, we want to recover the original signal and understand under what conditions it is possible to do so. The good news in this case is that, given conditions that are easily achieved in practice, it is possible to invert the STFT and exactly reproduce .
It is less common, but sometimes done, to carry out similar processing in the time domain. The steps are
convert a time-domain signal to a series of overlapping, windowed frames
modify the frames to get
convert the modified frames back to a time-domain signal
An example of such processing is to remove segments of an audio signal whose energy is below a threshold and are regarded as silent.
Again, we will focus on the case where no modification is done and ask when is it possible to recover from the It is expected that if we can do that, similar techniques will be valuable for the case where modifications are done.
The first condition for invertibility is that there is no gap between the frames. This makes intuitive sense because otherwise we are leaving out some samples when doing the windowing, so there is no way to reconstruct them.
The second condition is related to the specific technique used, which is called overlap-add and is explained below. It is common in the literature to find mention of the constant overlap-add (COLA) constraint. As we will see, this is a sufficient condition to achieve invertibility, but it is stronger than necessary and somewhat difficult to satisfy. We will show that the necessary condition, nonzero overlap-add (NOLA) constraint, is much weaker and is easily achieved.
Recent versions of SciPy include functions for the STFT and ISTFT. This note was prompted by the implementation of the istft function in scipy.signal (v1.0.0) which will check if COLA is satisfied and will abort if not. We want to make the case for removing that check.
There are lots of equations in what follows, but the mathematics is elementary and we have taken pains to be clear, perhaps to the point of being pedantic. A short Python script which illustrates the points made in this note is available here.
Let’s suppose we have a sequence of length that we want to analyse in successive frames of length with a hop size of from one frame to the next. We will only consider the case where there are no gaps between frames, which means that In other words, successive frames overlap by samples.
We’ll use (for time) as the index of the frames and as the index of the frequencies in a given frame. The samples in frame are
Let be a window function So far we have regarded and as finite sequences of length and respectively. To simplify the notation in the following, we will assume that they have both been extended to be defined for all by zero padding. The samples in the windowed frame can then be conveniently described as
Again, is defined for all but only a finite number of samples are nonzero.
We start with the easier case, which is time-domain processing.
We will arrive at the overlap-add technique by starting with the following sum. Note that we use infinite limits in the following, but only a finite number of terms are non-zero. For any (unless some of the , in which case we must have consider
We can then recover from
as long as the denominator is not zero. We call this the nonzero overlap-add (NOLA) constraint
A stronger constraint is constant overlap-add (COLA), defined by
for some constant An even more restrictive condition is that
If and the window satisfies the strong COLA constraint then we have a particularly simple inversion formula
However, while the COLA constraint is sufficient, it is not necessary for exact invertibility to be possible.
Although COLA is often introduced for the value the most common choice is for reasons that we will explain below.
where all sequences are extended to by zero padding.
The Fourier transform of frame is
We define a sequence which is the inverse Fourier transform of in frame and zero-padded for other
Because we did not modify we have But we’ll keep the notation to look forward to the situation where we do have a modified
By analogy with overlap-add in the time domain we consider the following sum. The equations look horrendous, but the calculation is straightforward.
In the line marked (*), we used Lemma 1 (below). In the next line, we used the fact that and are nonzero for the same values of
We conclude that
which is valid when the denominator is not zero. The inversion formula is valid under any of the following conditions, listed in order of increasing strictness
NOLA: for all
for all (see Lemma 3 below)
COLA: for all and some constant
Strong COLA: for all
As mentioned above, it is most common to choose The reason is that in the case where we did modify the STFT, there may not be a time-domain signal whose STFT matches our modified version. Choosing gives the Griffin-Lim optimal estimate (optimal in a least-squares sense) for a time-domain signal from a modified STFT. SciPy implements for the signal reconstruction in the istft routine.
We have shown that the commonly-used COLA constraint is sufficient but not necessary for the windowed time domain or STFT to be invertible. The weaker NOLA condition is necessary and sufficient. Almost all commonly-used window/overlap combinations satisfy NOLA.
We were motivated do this by the implementation of the SciPy library function istft. We recommend that the COLA constraint imposed by that function should be removed or at least made optional. Update: There is now a pull request for this in the SciPy repo.
Appendix
Lemma 1:
where
Proof: Using the formula for the sum of a geometric series, we have for the case
The case is obvious because in this case each term in the sum is equal to . QED
Lemma 2 (inverse Fourier transform):
Given a sequence its sampled discrete Fourier transform is
We can recover from by the equation
Proof: Substitute equation (3) into the right-hand side of equation (4) and use Lemma 1:
QED
Lemma 3:
Let be a window with strictly positive values: for all Extend to infinity at both ends by padding with zeros. Then if
for any
Proof: Since for we just need to show that for some
We’ll pick a that works. Let
By definition of the floor function,
Flipping this around and using the fact that , we have