Variable speed correction

Here is an outline of the technique I used to correct variable speed problems in songs transferred from tape sources. The tools I used are Linux tools, but no doubt there are tools to do similar things on your favorite OS.

I'll be walking through the steps using as an example a correction I did of Pink Floyd's Atom Heart Mother from the Pictures of Pink Floyd vinyl ROIO/bootleg. All copies of this ROIO have a terrible speedup in this song, presumably due to a bad recording/tape transfer. It's really the only blemish in this great concert.

  1. Obtain a high-resolution image of the FFT frequency spectrogram of the wav file. I used a program called baudline and just stitched together a bunch of screenshots. I ended up with a very large image that was about 7000x1000 pixels. Here's a small section of it:

    You can see how the pitch really increases (and then even decreases slightly).

    It's important that the frequency scale be linear -- i.e, 1khz, 2khz, 3khz, etc are all equally spaced, and the scale starts at 0. This makes the calculations very simple. Linear frequency scale is probably the default in most frequency analysis programs.

  2. Try to find prolonged pitches (long lines in the spectrogram) that follow the contours of the speed changes. For each pitch (frequency) you need to use, find the same pitch in a part of the song that does not have pitch problems. You'll need to use its location as a reference to calculate the correction.

    Try to find the highest possible pitches, as this will allow you to be more accurate. Speed shifts change all frequencies by the same relative amount (e.g, by 4%), but this means that in our image, the higher frequencies change by more pixels than the lower frequencies.

    In my example, I took one of the top pitch "lines" that you can see in the image above. Since the pitch was quite steady before this sudden increase, and the song doesn't change tonalities much, it was easy to find the same pitch at a reference point earlier in the song (where the tape speed was steady).

  3. Using an image editing program, zoom in and collect the locations of some pixels along the path as it changes pitch. I found it convenient to use horizontal guides in Gimp to know when the pitch was actually changing. Here is what it looks like zoomed in:

    While I was working, I marked my sample points in red to keep track of them better. You should have more sample points the faster the pitch changes. But don't use too many -- just as your pixels will only jaggedly follow the smooth curve, so will the results jaggedly waver in pitch instead of matching the smooth pitch change. I tried to only sample pixels that looked like they were in the exact center of the smooth curve.

  4. Now we have to do some math. It's not hard. For each pixel, you should already have its distance (in pixels) along the time axis and along the frequency axis (measured from the bottom of the image -- this may not follow the conventions of your image editing program).

    What you need to compute depends on the program you're using. I was using a program called ReZound, and to do speed-correction on a curve, it wanted the correction values and time offsets both as percentages (between 0 and 1). Your app may want the time offsets differently (i.e, as seconds).

    Here are 3 of my data points:

    time (pixels)frequency (pixels)reference frequency (pixels)
    4124126120
    4133127120
    4142129120

    To get a time offset in the range [0,1], I just needed to divide the pixel location by the width of the image (7413 pixels). If you needed the time offset in seconds, you could multiply the [0,1] time offset by the length of the audio file (in seconds):

    time (px)freq (px)ref freq (px)time offset
    41241261200.5563
    41331271200.5575
    41421291200.5587

    Now to calculate the correction amount, we simply divide the reference frequency by the observed frequency. It doesn't matter that they are in "pixel" units (not Hz), since the Hz scale was linear.

    time (px)freq (px)ref freq (px)time offsetcorrection
    41241261200.5563.9524
    41331271200.5575.9449
    41421291200.5587.9302

    You may want to set up a spreadsheet to automate this process..

  5. Finally, we have to import the data points into the audio-editing program. I used ReZound, and after poking around a bit, I discovered that it stored speed-correction presets in ~/.rezound/presets.dat.

    After I got the data points into the appropriate format, I was able to simply go to the curved speed correction screen and select the preset:

All in all, I was very happy with the results. Here's the before and after of the spectrogram of the trouble area:

The best part is that this terrific concert is finally 100% listenable now!