A little more than a year ago, my team at Voxeet set out on a quest to create a mobile app that reimagined communication and the conference call for the 21st century. To do so, we needed to banish the familiar features of conference calls today — walkie-talkie static, an opaque interface that obscures who’s talking when, an inability to travel from one device to the next — and introduce modern, mobile-optimized and 3D-immersive-sound features in their place. To do that to its maximum potential, we needed WebRTC.
WebRTC is a plugin made for browsers like Internet Explorer, Firefox and Chrome that enables browser-to-browser applications to engage in real time via voice, P2P file sharing and video. Imagine pinning a recipe on Pinterest and being able to connect straight out of your browser with the five other people who have tagged that recipe, whether by voice, chat or video, you choose. That’s the sort of game-changing real-time communication that WebRTC makes possible, and it promises to usher in a whole new era of Internet voice solutions, as the blogosphere has already started to note (See: Why voice is the next big internet wave).
Simply put, WebRTC is the emerging standard. It is where the entire market is headed and all apps will need to have this at their core moving forward. From a developer standpoint, we were especially interested in implementing WebRTC because it pulls together some of the best open-source components available for improved audio capture and playback, network transport, jitter control and adaptive quality management — all important features if you want to reinvent the conference call for the mobile era.
There was just one problem: WebRTC is made for browsers working in peering mode, whereas we have native apps working in client-to-server mode. We thus had a major challenge on our hands if we wanted to integrate WebRTC into our app. It turned out to be even tougher than we expected, but many months later we’ve survived to tell the tale, and we’ve got one of the first WebRTC for mobile apps to show for it.
One of the biggest hurdles for us to overcome was making WebRTC symbiotic with the Voxeet architecture. Imagine you want to record an orchestra with dozens of musicians, each of whom has his or her own microphone, and play it on your stereo sound system. You have to mix all those channels to produce a two-channel output containing all the sounds coming from all the sources. The only mixing mode used by conferencing systems today is monophonic mixing: All channels are mixed into a one-channel output, which results in crummy quality.
Voxeet’s innovation is to give all speakers on one of our conference calls their own audio streams, which results in the kind of natural, immersive sound you hear when you fire up a recording of a Beethoven Sonata on your home stereo system. Pretty cool stuff, but it was a pain to code WebRTC into this, and it took a number of nifty work-arounds on the developing front.
Another challenge was that most existing conferencing systems today use one of two modes. The first is peer-to-peer, in which each participant establishes a direct connection to all other participants in the call (think Skype). On the plus side, this is easy to build and has low operating costs, but direct connections between clients are hard to establish due to firewall limitations, security is tough to ensure due to all the data streams and the operator has absolutely no control on the quality of the streams. Big companies therefore avoid solutions like this.
The second mode is client-to-server mode, in which all participants are connected to a common server that mixes all the audio streams into a single monophonic stream (think VoIP conferencing platforms). On the plus side, it’s secure, but it costs a fortune to maintain, the sound is subpar and it’s near impossible to add advanced signal processing to increase quality. Big companies go for this solution, sacrificing quality in the process (see “A Conference Call in Real Life” to see how employees feel about this).
Our solution at Voxeet was to blend the two. Clients are only connected to a server à la client-to-server, but streams coming from each participants are carried without modification to the others à la peer-to-peer. This allows us to get the best of both worlds, with low operating costs and great security and control.
Here, too, WebRTC presented challenges, since the WebRTC core is designed to work in peering mode and thus encodes and sends a single voice for each participant, which is untenable as the number of people on the line increases. Nonetheless, work-arounds are possible, which left us with a robust WebRTC foundation.
Hardly a week goes by without hearing from someone new who wants to understand how we created our own architecture category that fuses WebRTC with mobile and the Cloud. We’re happy to share. The more people who embrace immersive communication the better, as far as we’re concerned. It’s the way of the future and unlocks whole new possibilities for collaboration and innovation. We’d rather see that arrive sooner than later, wouldn’t you?