$\newcommand{\lyxlock}{}$

Warning: MathJax requires JavaScript to correctly process the mathematics on this page. Please enable JavaScript on your browser.

Invertibility of overlap-add processing

Bruce Sharpe

1 Introduction

It is common in audio (and other kinds of signal processing) to

convert a time-domain signal to a spectrogram via the short-time Fourier transform (STFT)
modify the spectrogram to get
convert the modified spectrogram back to a time-domain signal via the inverse short-time Fourier transform (ISTFT)

An example of such processing is spectral subtraction for noise reduction.

When creating the spectrogram in Step 1, a window

is usually applied to the samples in each frame, and the frames overlap. Because of this, it is not completely obvious if and how it is possible to construct a time domain signal from the modified spectrogram.

In fact, depending on the processing that is done, there may not be a time-domain signal whose STFT is exactly the output from Step 2. In this note, we are going to ignore this issue and consider only the simple case where we don’t do anything in Step 2:

. That is, we want to recover the original signal

and understand under what conditions it is possible to do so. The good news in this case is that, given conditions that are easily achieved in practice, it is possible to invert the STFT and exactly reproduce

It is less common, but sometimes done, to carry out similar processing in the time domain. The steps are

convert a time-domain signal to a series of overlapping, windowed frames
modify the frames to get
convert the modified frames back to a time-domain signal

An example of such processing is to remove segments of an audio signal whose energy is below a threshold and are regarded as silent.

Again, we will focus on the case where no modification is done

and ask when is it possible to recover

from the

It is expected that if we can do that, similar techniques will be valuable for the case where modifications are done.

The first condition for invertibility is that there is no gap between the frames. This makes intuitive sense because otherwise we are leaving out some samples when doing the windowing, so there is no way to reconstruct them.

The second condition is related to the specific technique used, which is called overlap-add and is explained below. It is common in the literature to find mention of the constant overlap-add (COLA) constraint. As we will see, this is a sufficient condition to achieve invertibility, but it is stronger than necessary and somewhat difficult to satisfy. We will show that the necessary condition, nonzero overlap-add (NOLA) constraint, is much weaker and is easily achieved.

Recent versions of SciPy include functions for the STFT and ISTFT. This note was prompted by the implementation of the istft function in scipy.signal (v1.0.0) which will check if COLA is satisfied and will abort if not. We want to make the case for removing that check.

There are lots of equations in what follows, but the mathematics is elementary and we have taken pains to be clear, perhaps to the point of being pedantic. A short Python script which illustrates the points made in this note is available here.

2 Windowed frames

Let’s suppose we have a sequence

of length

that we want to analyse in successive frames of length

with a hop size of

from one frame to the next. We will only consider the case where there are no gaps between frames, which means that

In other words, successive frames overlap by

samples.

We’ll use

(for time) as the index of the frames and

as the index of the frequencies in a given frame. The samples in frame

are

Let

be a window function

So far we have regarded

and

as finite sequences of length

and

respectively. To simplify the notation in the following, we will assume that they have both been extended to be defined for all

by zero padding. The samples in the windowed frame

can then be conveniently described as

Again,

is defined for all

but only a finite number of samples are nonzero.

3 Inverting in the time domain

We start with the easier case, which is time-domain processing.

We will arrive at the overlap-add technique by starting with the following sum. Note that we use infinite limits in the following, but only a finite number of terms are non-zero. For any

(unless some of the

, in which case we must have

consider

We can then recover

from

as long as the denominator is not zero. We call this the nonzero overlap-add (NOLA) constraint

A stronger constraint is constant overlap-add (COLA), defined by

for some constant

An even more restrictive condition is that

and the window

satisfies the strong COLA constraint then we have a particularly simple inversion formula

However, while the COLA constraint is sufficient, it is not necessary for exact invertibility to be possible.

Although COLA is often introduced for the value

the most common choice is

for reasons that we will explain below.

4 Forward and inverse DFT for a finite sequence

To get ready to talk about inverting the STFT, we review the definitions of the forward and inverse Fourier transform for finite sampled sequences.

Given a sequence

the discrete Fourier transform of

is given by

We can exactly recover

from

via the inverse Fourier transform

See Lemma 2 in the Appendix if you want a refresher on why this is true.

5 Inverting the STFT

Again we start with windowed frames

where all sequences are extended to

by zero padding.

The Fourier transform of frame

We define a sequence

which is the inverse Fourier transform of

in frame

and zero-padded for other

Because we did not modify

we have

But we’ll keep the notation

to look forward to the situation where we do have a modified

By analogy with overlap-add in the time domain we consider the following sum. The equations look horrendous, but the calculation is straightforward.

In the line marked (*), we used Lemma 1 (below). In the next line, we used the fact that

and

are nonzero for the same values of

We conclude that

which is valid when the denominator is not zero. The inversion formula is valid under any of the following conditions, listed in order of increasing strictness

NOLA: for all
for all (see Lemma 3 below)
COLA: for all and some constant
Strong COLA: for all

As mentioned above, it is most common to choose

The reason is that in the case where we did modify the STFT, there may not be a time-domain signal whose STFT matches our modified version. Choosing

gives the Griffin-Lim optimal estimate (optimal in a least-squares sense) for a time-domain signal from a modified STFT. SciPy implements

for the signal reconstruction in the istft routine.

6 Discussion and conclusions

We have shown that the commonly-used COLA constraint is sufficient but not necessary for the windowed time domain or STFT to be invertible. The weaker NOLA condition is necessary and sufficient. Almost all commonly-used window/overlap combinations satisfy NOLA.

We were motivated do this by the implementation of the SciPy library function istft. We recommend that the COLA constraint imposed by that function should be removed or at least made optional. Update: There is now a pull request for this in the SciPy repo.

Appendix

Lemma 1:

where

Proof: Using the formula for the sum of a geometric series, we have for the case

The case

is obvious because in this case each term in the sum is equal to

. QED

Lemma 2 (inverse Fourier transform):

Given a sequence

its sampled discrete Fourier transform is

We can recover

from

by the equation

Proof: Substitute equation (3) into the right-hand side of equation (4) and use Lemma 1:

QED

Lemma 3:

Let

be a window with strictly positive values:

for all

Extend

to infinity at both ends by padding with zeros. Then if

for any

Proof: Since

for

we just need to show that for some

We’ll pick a

that works. Let

By definition of the floor function,

Flipping this around and using the fact that

, we have

which is what we needed to show. QED