Variable speed correction
Here is an outline of the technique I used to correct variable speed problems in songs transferred from tape sources. The tools I used are Linux tools, but no doubt there are tools to do similar things on your favorite OS.I'll be walking through the steps using as an example a correction I did of Pink Floyd's Atom Heart Mother from the Pictures of Pink Floyd vinyl ROIO/bootleg. All copies of this ROIO have a terrible speedup in this song, presumably due to a bad recording/tape transfer. It's really the only blemish in this great concert.
- Obtain a high-resolution image of the FFT frequency spectrogram of the wav
file. I used a program called baudline
and just stitched together a bunch of screenshots. I ended up with a very
large image that was about 7000x1000 pixels. Here's a small section of
it:
You can see how the pitch really increases (and then even decreases slightly).
It's important that the frequency scale be linear -- i.e, 1khz, 2khz, 3khz, etc are all equally spaced, and the scale starts at 0. This makes the calculations very simple. Linear frequency scale is probably the default in most frequency analysis programs.
- Try to find prolonged pitches (long lines in the spectrogram) that
follow the contours of the speed changes. For each pitch (frequency) you
need to use, find the same pitch in a part of the song that does not have
pitch problems. You'll need to use its location as a reference to
calculate the correction.
Try to find the highest possible pitches, as this will allow you to be more accurate. Speed shifts change all frequencies by the same relative amount (e.g, by 4%), but this means that in our image, the higher frequencies change by more pixels than the lower frequencies.
In my example, I took one of the top pitch "lines" that you can see in the image above. Since the pitch was quite steady before this sudden increase, and the song doesn't change tonalities much, it was easy to find the same pitch at a reference point earlier in the song (where the tape speed was steady).
- Using an image editing program, zoom in and collect the locations of
some pixels along the path as it changes pitch. I found it convenient
to use horizontal guides in Gimp to know
when the pitch was actually changing. Here is what it looks like zoomed
in:
While I was working, I marked my sample points in red to keep track of them better. You should have more sample points the faster the pitch changes. But don't use too many -- just as your pixels will only jaggedly follow the smooth curve, so will the results jaggedly waver in pitch instead of matching the smooth pitch change. I tried to only sample pixels that looked like they were in the exact center of the smooth curve.
- Now we have to do some math. It's not hard. For each pixel, you should
already have its distance (in pixels) along the time axis and along the
frequency axis (measured from the bottom of the image -- this may not
follow the conventions of your image editing program).
What you need to compute depends on the program you're using. I was using a program called ReZound, and to do speed-correction on a curve, it wanted the correction values and time offsets both as percentages (between 0 and 1). Your app may want the time offsets differently (i.e, as seconds).
Here are 3 of my data points:
To get a time offset in the range [0,1], I just needed to divide the pixel location by the width of the image (7413 pixels). If you needed the time offset in seconds, you could multiply the [0,1] time offset by the length of the audio file (in seconds):time (pixels) frequency (pixels) reference frequency (pixels) 4124 126 120 4133 127 120 4142 129 120 Now to calculate the correction amount, we simply divide the reference frequency by the observed frequency. It doesn't matter that they are in "pixel" units (not Hz), since the Hz scale was linear.time (px) freq (px) ref freq (px) time offset 4124 126 120 0.5563 4133 127 120 0.5575 4142 129 120 0.5587 You may want to set up a spreadsheet to automate this process..time (px) freq (px) ref freq (px) time offset correction 4124 126 120 0.5563 .9524 4133 127 120 0.5575 .9449 4142 129 120 0.5587 .9302 - Finally, we have to import the data points into the audio-editing
program. I used ReZound, and after poking around a bit, I discovered
that it stored speed-correction presets in ~/.rezound/presets.dat.
After I got the data points into the appropriate format, I was able to simply go to the curved speed correction screen and select the preset:

