This is a post about getting good-quality VoIP on an imperfect broadband connection. The first section is the nerdy part which describes why the problem; the second section details a fix that works; he third section discusses some business implications. So you may want to skip some sections.
In Vermont, we live in a place where DSL doesn’t reach and cable is a costly construction job to get up the hill. We’ve worked with a local ISP on an alternative broadband service which I may blog about at some point. For the moment, the broadband we have is not great for VoIP although it is perfectly usable for email, web browsing, and other applications.
The problem is jitter. The time it takes for a packet to get from any particular location to my house is highly variable. Most packets arrive in 70ms but typically there are clusters of packets that take over 100ms to arrive. This difference was enough to cause fits for the Motorola VT1000 adapters which Vonage sent to me when I signed up for their service last year. These adapters worked fine in a couple of other locations where jitter was not a problem and most people who have them report good quality.
I believe that the problem with the VT1000s is in jitter buffer management.
Most engineers were sure, way back in 1996, that decent quality voice could never be implemented on the Internet. Because the human ear is a very sensitive instrument, we get annoyed quickly if the speech we hear is not the almost-continual even-paced sound we know is coming from the speaker’s mouth. The problem is that the packets which contain data – including voice – are free to travel different paths on the Internet. This is both the Internet’s strength (because it makes it robust) and its weakness (because it makes it unpredictable). Packet two may well arrive before packet one; packet three may not arrive at all. This is not a problem for applications like email which rely on TCP/IP to reassemble the packets of data in the correct order and request retransmission of any packets which get lost along the way.
The first implementations of VoIP included a large jitter buffer. The jitter buffer holds incoming packets until a bunch of them have been received and sequenced, then it starts to play out the sound continuously. Meanwhile, new packets are being received. The idea is that buffering increases the odds that, even if some subsequent packets are late to arrive, there will still be enough packets available to play continuously to the listener, Clever algorithms detect silence and stretch that out if needed rather than pausing in the middle of a sound. Requesting retransmission of a missing packet or waiting a long period of time for it to arrive is simply not an acceptable strategy for VoIP. If a packet fails to arrive by the time its place in the buffer is being turned into sound, smart algorithms are used to interpolate the missing sound – usually with good success.
The problem with a large jitter buffer, however, is that it increases latency – the time lag between when someone says something into a phone and when it actually reaches the listener’s ear. The larger the buffer, the larger the window for packets to arrive; but, the larger the buffer, the longer the average delay between when a packet arrives and when it is heard. Early VoIP calls had detectable latency roughly equivalent to that in calls made through geo-stationary satellites whose high orbits take detectable time to reach even at the speed of light.
The human ear can detect latencies of over 250ms, 200ms for really sensitive people. As VoIP went mainstream, this latency was less and less acceptable. Latency can be reduced by reducing the size of the jitter buffer and high-quality, predictable IP makes smaller jitter buffers practical. Jitter buffer size was originally either fixed for a particular piece of equipment or was a variable that the user set. Most routes are good most of the time but IP quality varies even on the best of links. Adaptive jitter buffer management was born. The VoIP device, whether it is a gateway at a carrier or an adapter in the home, adjusts the size of the jitter buffer dynamically to the smallest size it calculates is adequate for the variability it is experiencing in packet arrival time. A continuous tradeoff is made between latency and the percentage of packets that will have actually arrived by the time the sound is generated at the receiving end.
The problem we were experiencing with our Motorola adapters and Vonage service in rural Vermont is that, during the day time, lots of syllables were disappearing from what we heard. People at the other end could hear us fine but it was sometimes literally impossible for us to understand what they were saying. There was high correlation between jitter, which I could see both by ping-tracing and by using tools like www.testmyvoip.com, and gaps in the sound. My theory is that the VT1000, in an attempt to reduce latency, set up too small a jitter buffer with the consequence that lots of late-arriving packets got dropped. Even the best interpolation can’t cover up the loss of many consecutive packets. I tried to find some way to adjust the jitter buffer settings in the VT1000 but apparently there aren’t any.
We knew the ultimate solution was less jitter in our broadband connection; we are confident that this will happen eventually. However, we found ourselves calling people back on our Verizon line to the extent that a Verizon CSR suggested we go on a different calling plan for high-volume callers. We didn’t want to do that; we use VoIP! We were also planning a trip to install VoIP at a hospital in the developing world where I wasn’t confident of finding high-quality broadband connections.
If I had known how easy the fix was, I wouldn’t have waited so long. I went to Circuit City and bought a Linksys PAP2 adapter ($60 which you get back in rebates from Vonage and Circuit City with new service activation) to test for my trip to the developing world. The Linksys box was about an eighth the size of the Motorola adapters (not surprising; Moore’s law is still at work). Better than that, though, it solved the quality problem. My very strong theory, although I have not attempted to disassemble any code, is that the Linksys box simply has better jitter buffer management than the Motorola box.
The Linksys adapter worked fine when we took it abroad and I hope soon to blog about it being in everyday use.
When we came home, we bought two more Linksys adapters and used them to replace the old Motorola adapters. The activation of these required a call to Vonage tech support as part of the normal procedure; but, as I blogged yesterday, that went smoothly. The whole process not counting the trip to Circuit City (always expensive) took half an hour.
This week we are not using Verizon!
I’ve long been convinced that, unless something better comes along, all landline voice communication will be VoIP by 2010. WiFi handsets and other developments may convince me to extend that prediction to wireless voice communication as well.
However, there has been concern that VoIP deployment may be delayed by either inherent quality problems or reluctance (I’m putting it politely) by access providers to provide IP quality suitable for any VoIP offering but their own (see Cringley). I’ve blogged previously that the business dynamic won’t allow VoIP to be delayed by dirty tricks.
My experience with the adapters convinces me that, within almost any conceivable range of broadband quality, voice quality is simply a software problem. And software problems get solved just as Cisco apparently solved my jitter problem with the software in their Linksys PAP2 adapter. Already broadband is good enough in the developed world so that most people won’t have to do what I did to get good VoIP quality. As the new generation of VoIP adapters like the PAP2 fill the distribution channel, even people with relatively high jitter will never know they have a problem – in fact, won’t have a problem.