Synchronising music playback on networked computers

Now and again I get questions about SyncBoss – software for synchronizing music between computers between LAN which I wrote during my Uni daze. Sharing with three housemates at the time, an eclectic range of music playing all at once was the norm. While Bon Jovi, Mos Def and Tiesto can sound great in isolation, no-one was particularly keen to hear them all at the same time. And since none of us was willing to surrender their divine right to listen to their music, we ended up in a situation of “sound escalation”, where each person attempted to override the sound of the other.

The solution to this problem is not so hard, I was told. Just buy some speakers and cable, and run it around your house. Well, that would be great. Except we were eating 2 minute noodles on a shoestring budget. So buying things was out the window. What we did have was computers, and our speakers were connected to our computers. And I also had an abundance of free time. So I wrote SyncBoss.

At the time I was using WinAmp to listen to music (now I use other things, namely Foobar2000 to listen to music which is on my PC), so I wrote a WinAmp plugin to stream music to a Java app on the same computer. This Java app acts as a server to some number of clients.

Millisecond-precise synch

The difficult part of achieving millisecond-precise synch (actually, I found the maximum tolerance of the human ear to be around 50ms – under this the music sounded as if it were from a single source) is accounting for the diverse hardware characteristics of your typical PC. Not only does each PC on a LAN typically have a wildly different clock time (yes, even NTP (Network Time Protocol) synched computers), but those clocks drift at a rate which is noticeable even after about a minute of playback. The other problem is network latency, which makes it difficult to send time-sensitive signals between participating clients.

Getting Synchronized

The first step is to get a synchronized clock for all the machines in your network. I initially thought NTP might fit the bill here, but the Network Time Protocol isn’t designed for precise synchronization. The level of synchronization a typical NTP implementation will give was pretty useless to me. Luckily, implementing my own synchronization algorithm was very straightforward, using Smoothed Round Trip Time. I sent synchronization packets every second to each client. The basic steps I used to do this:

Server creates a packet, appending the local time-stamp
Client receives packet, appends local time-stamp, returns packet to server
Server processes packet to create an offset for the client, which is used in all future messages.

The code for step 3 looks like:

long timeServer = Long.parseLong(args[0].substring(1));
long timeClient = Long.parseLong(args[1]);
Calendar cal = Calendar.getInstance();
long currentTime = cal.getTimeInMillis();
long delay = (currentTime - timeServer) / 2;

long offset = timeClient + delay - timeServer;
if (polls == 0) {
this.offset = offset;
} else {
this.offset = (this.offset * polls + offset) / (polls + 1);
}
if (polls < 5) polls++;

Keeping in Synch

I mentioned earlier that different hardware has different clock rates. I found the music on clients was often skipping, due to drifting clocks. My solution to this was to re-speed the music, making it run at a -1% to 1% speed offset from the server.

Changing the speed of a PCM stream is actually quite simple. To speed the sound up by 10%, you remove every 10th frame. To slow the music down by 10%, you dupe every 10th frame. It will distort the pitch of the music, but in the 1% range this won’t be noticable.

The following code snippet shows the implementation. The effective and historical drift multipliers are there because the packet size may not be large enough for precision re-speeding. For example, if the packet only contains 100 frames of sound, and you want to speed the music up by 0.1%, you would round that down to 0% and remove no frames, which wouldn’t be very useful.


public byte[] respeed(byte[] buf, int off, int len, double targetSpeed) {
if(effectiveDriftMultiplier == -1) {
effectiveDriftMultiplier = driftMultiplier;
}
byte[] outbuf;
double invFactor = 2 - effectiveDriftMultiplier; //high multplier means we want less frames, and vica versa
int size = (int) (invFactor * len);
outbuf = new byte[size];

for (int i = 0; i < size / format.getFrameSize(); i++) {
System.arraycopy(buf, (int) (effectiveDriftMultiplier * i) * format.getFrameSize(), outbuf, i * format.getFrameSize(), format.getFrameSize());
}

double minunit = 1.0 / (((double)MediaTransmitter.getPacketSize() / (double)format.getFrameSize()));
double actualDriftMultiplier = 2.0-(double)((double)size / (double)len);

historicalDriftMultiplier = ((historicalDriftMultiplier * respeedCount) + (actualDriftMultiplier)) / (respeedCount+1);

if(historicalDriftMultiplier < targetSpeed) {
effectiveDriftMultiplier = effectiveDriftMultiplier + minunit;
} else if (historicalDriftMultiplier > targetSpeed) {
effectiveDriftMultiplier = effectiveDriftMultiplier - minunit;
}

respeedCount++;
return outbuf;
}

Wrangling the Sound Card

The sound card can be a bit of a pain for synchronisation. This is exacerbated using the Sun Java JRE, which has a fairly terrible implementation of the sound API. Here are some of the tricks I used to manage this:

Code for an average synch:

if(line.isRunning()) {
line.stop();
Thread.sleep(10); //let buffer fizzle
line.flush();
}

while((double)curTime() + (double)playerOffset.getOffset() + seekOffset < (double)fragmentTimes[fragment%BUFSIZE]) {//wait till it's time to play it
Thread.sleep(1);
}
line.start();

Feedback loop for synching:

Each sound card is varying levels of terrible when it comes to seeking. So each time SyncBoss performs a seek, it waits a while for the sound card output to stabilize, then assesses the accuracy of the seek. This is used to determine a ‘seek offset’ value to adjust further seeks.

And finally…

Each sound card has different properties when it comes to reporting what it is currently playing. This is not something which can be measured in the software. Knowing what the offset is requires manual measurement, or a knowledge of the characteristics of the particular sound card in use.

The full source code is available on-line here: http://code.google.com/p/syncboss/