Multipoint for VoIP had never been more varied than it is today.
Earlier this year, Gustavo Garcia Bernardo wrote a great post about multipoint in WebRTC for webrtcHacks. What got me to write this post is the work I am doing on the WebRTC Glossary, where I had to give some quick explanations on mixing, mesh and routing of media. While I like Gustavo’s post, I had my own things to share about this topic.
The first thing to say here is that WebRTC is no different than other VoIP protocols when it comes to multipoint support. The thing with other VoIP protocols is that they almost always fail to standardize multipoint properly, leaving the implementer with two options:
- Select a mixer solution (because it requires no standardization for the naive implementation)
- Do something proprietary
With VoIP, standardization comes first and foremost, so you ended up with a mixer. The adventurous ones went for a proprietary solution, but most didn’t really succeed.
The difference with WebRTC is that standards for signaling are no longer that important. And less so interoperability across services. This made the ability to non-mixer media architectures for multipoint possible.
The 3 options available when trying to handle media in a multipoint call are:
- Mesh, where all participants send their media to all the rest of the participants
- Router, where a central server receives media from all other participants and routes that media to the rest, with little to no processing of that media
- Mixing, where all participants send their media to an MCU, which decodes and then generates a single media stream that is sent to all participants
Things to remember
- Each option has its nuances and varieties. Here are a couple of examples
- A router can also use SVC to provide better media quality and user experience
- A mixer can encode the same media stream for all participants, or use an “encoder per participant”, where each participant gets its own special view
- A mixer/router can show a single media stream based on the “active speaker” (=he who shouts the loudest)
- A mixer/router can throw participants from view after a given threshold (usually 10 or so), deciding to show only the loudest or most recent streams
- Hybrid options are also acceptable
- Using mesh up to 4 participants and then switching to a router or mixer mode
- Having a mesh for the active participants in a webinar and then using a mixer or router to stream media to a larger group of passive viewers
- There is no one-size-fits-all for multipoint, which makes WebRTC the perfect piece of technology for multipoint (even if most will complain that it isn’t really supported in WebRTC)
Nice post. Thx for the reference 🙂
You are most welcome.
I think SVC is no longer hip. Note how Google moved hangouts to VP8 which doesn’t do SVC and uses simulcast instead.