Looking back: Protracker player in Kotlin

Hi everyone - it's been a while since I've updated this blog. In September, I was hard at work on my protracker player in Kotlin. It's basically as done as it's going to get now, as I have achieved my goal (here's the repository if you'd like to see it). Now, it's time to talk a bit about what went into it and how I approached the problem.

The Protracker Format

First, a few words about the Protracker format. It emerged in the early 1990's, but its predecessor was from 1987, in an application called the Ultimate Soundtracker, by Karsten Obarski. It allowed people to write music using 8-bit audio samples, with as many as four playing simultaneously in channels or "tracks." This format was reverse-engineered by hobbyists, resulting in trackers such as Noisetracker and Protracker - the latter of which was the one that really took off.

The ProTracker interface

Protracker was the first successful music tracker in the demoscene, and dozens of other trackers and formats were eventually created. Of those, three of them stood out, each more advanced than the last: Scream Tracker, Fast Tracker, and Impulse Tracker.

The Protracker format supported four channels, two on the left speaker, two on the right. It supported a variety of effects, including pitch slides, volume slides, arpeggios, and global effects to change the tempo and song position. Musicians would write the instructions in "patterns" and then put those patterns into an order list which was the order in which the patterns would be played. Instruments are just 8-bit PCM data, which is played at varying frequencies to represent pitch.

My goal: write a player for a specific song, "space debris" by Finnish composer Captain. It's widely regarded as a classic of the early days of the demoscene. It is to the demoscene as "Johnny B. Goode" is to Rock and Roll.

The first challenge: Resampling

Before I could go anywhere I needed to figure out how to play the instruments at different pitches. There's no simple way in the JVM to take PCM data and resample it - I was going to need to do it myself. At a sampling rate of 44.1 khz, playing the 8-bit data as-is results in a very high pitch. My first experiment was to take the PCM data - a byte array - and double the length, repeating each element of the array. This did what I expected: it played the same note, one octave lower. Doing the same thing did it again.

But I was going to need to play at other pitches, and I was going to need to interpolate - basically, create a smooth line between values in the array rather than repeating the same value (which creates a "staircase" effect that sounds distorted). In other words, if I had these values:

10 20

And I need to resample so that there are four new values in between them, I don't want this:

10 10 10 10 10 20

But rather, I want this:

10 12 14 16 18 20

So, the resampling algorithm essentially had two tasks: first find out what the new frequency is, and second, interpolate the values to get a smooth line. In the Protracker format, you are given a "period" representing how you should recast the array. I figured out the math on this. First, remember that a "sample" in this context is a single value in a PCM stream. At 44.1 khz, we will have 44,100 samples every second. So let's call our sample position X, which begins at zero and is essentially a counter representing time.

The period (which I call "pitch" in my code) is a float value that is used to determine how we should "step" through our byte array of audio data. To calculate this step, we need to know two constants: The sampling rate (44100.0 khz) and the Amiga clock rate (7093789.2 in the PAL region). Then we use this formula to calculate our step:

step = (7093789.2 / (period * 2)) / 44100.0

This step value represents how much we increase our index as we advance through the instrument's audio data. When we begin playback, maintain an audio data reference, starting at 0. As x increases, we continually add step to the audio data reference, but round down to the nearest integer. So, if our step is 0.2, and we start at index zero, we advance through the audio data like this:

step = 0.2
audioDataReference = 0.0
sample1 = audioData[0.0]
sample2 = audioData[0.2] (step added)
sample3 = audioData[0.4] (step added again)
sample4 = audioData[0.6] (step added again, and so on)
sample5 = audioData[0.8]
sample6 = audioData[1.0]

Since floats aren't actually valid array index values, we will need to floor the values and convert them to integers. This means that sample1 through sample5 will all have the same value from the audio array. Sample6 will be a different value. So, if audioData[0] is equal to 10, and audioData[1] is equal to 20, we'll end up with this sequence again: 

10 10 10 10 10 20

That's where the interpolation comes in. We can use the step to calculate how many iterations until we advance to the next value in the audio data - that's our "run." We find the difference between the two values in the audio data - that's our "rise." We can now calculate a slope, and apply that slope to the values we are interpolating to get a smooth line rather than a "staircase."

Sound complicated? It is a bit, but I was able to figure it out and I'm not even an audio programmer. The resampling algorithm is pretty simple and doesn't factor in things like anti-aliasing and really, a sinc interpolator would probably sound better - but this is what I was able to do.

The second challenge: Song position

The next challenge was going to be keeping track of our song position. This wasn't quite as hard as the resampler, but it still presented challenges. Protracker songs are organized like this: A song is a collection of patterns. Each pattern has four channels, or columns - representing the number of instruments that can be played simultaneously. 

Each channel has 64 rows in it. Each row can contain information on a note to play, an instrument number on which to play the note, or an effect number with its parameters. It can also contain no data, which usually means "just keep playing whatever is already playing in this channel."

A row is the smallest division in the user interface, but there are more internal divisions beyond this: Each row is divided into a number of "ticks" which by default is six (but can be changed via effects). Ticks are an important factor in applying effects, many of which are applied per-tick.

Finally, the smallest unit is the sample, representing a single point in the audio data that we will send to our output device. I decided to calculate how many samples would be generated per tick, and maintained a counter in my main audio generator. When I reached the maximum number of samples per tick, I would advance to the next tick and reset my sample counter. Likewise, when I reached the max number of ticks per row, I would reset the tick counter and advanced to the next row. And likewise for patterns - when I reached the end of a pattern, I would advance to the next pattern in the order list, and then continue until I reached the end of the song.

All the while, I would generate samples for each channel, mix them together, and send them to the output device. I initially used a test song, which contained only a non-looped instrument and a single channel. For fun, I tried playing space debris on it - it did not play correctly at all, and really didn't sound much like the song. But it didn't throw any exceptions, so that was something.

I now had a basic framework for playing a song. Next, I was going to need to implement all of the features of ProTracker - in my next post, I'll talk about how I approached those features, and how my code changed over time.

Comments