I don’t really know, but there’s a lot in this innocent “WebRTC JS library” question that isn’t clear without digging a lot further.
Every now and again (= a week or two) I get a question asking me to help with the selection of this or that open source component, pick a CPaaS vendor for a project, find someone to outsource WebRTC work to or hire a stellar WebRTC developer.
Many of these emails are about shortcuts. Give us that silver bullet. Shortcuts seldomly work with WebRTC.
Last week, I had a question come in. A startup is looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
The problem I had with it, is that this simple question of which WebRTC JS library should I use didn’t align that well with the set of questions asked.
This article is about what components are needed for WebRTC deployments. If you’re looking to dig deeper into the media paths in WebRTC, then join my free webinar: Mesh, MCU or SFU
Let’s break down WebRTC to its main components as seen from a network architecture perspective:
- Signaling
- NAT traversal
- Media
- Other
Here’s a slide I’ve been using to explain where a device gets connected to in a typical WebRTC session –
Signaling
Signaling is how the devices reach out to one another. They can’t do it directly, since they don’t have each other’s IP address, and even if they could, we need some kind of a “protocol” for them to do that.
Signaling in WebRTC is… non-existent. You need to bring your own signaling. This approach confuses some developers, and probably causes this lack of a good solution that fits no-one and everyone at the same time.
Today, you can use SIP, XMPP, MQTT or just proprietary protocols as your signaling for WebRTC traffic. Each such protocol will have its own set of frameworks, services and SDKs that you can use. Some will be free (open source) while others will be licensable software or SaaS based.
NAT traversal
NAT traversal is about being able to actually get media flowing.
WebRTC is P2P (peer to peer), meaning you can, in some cases, send media directly across devices. This is something that is impossible otherwise with web browsers. WebRTC also have a preference on using UDP, since it offers better real time low latency characteristics. It is also the only web browser traffic that makes use of UDP, which means it is sometimes blocked as well.
NAT traversal is how WebRTC get past these pesky issues, and it requires additional servers to help it out to do so. Some of these servers (TURN) may end up relaying all traffic through it…
At the end of the day, you will need to deploy these servers or pay for someone to do it for you (no free meals here).
Media
Recording. Group calling. The need to control media paths. Broadcasting. All these end up requiring media servers in the backend. Ones that can process media in one way or another.
The most common approaches today is to use SFUs and solve most of the world/media problems with them. These also offer some signaling protocol of their own – my preference is usually to short circuit these and redirect all this traffic through a different signaling/messaging path – especially for the more complex applications.
Again, they come in different shapes, sizes and types – open source ones and commercial ones. You usually won’t be able to pay for them separately as a hosted service and will need to go to a CPaaS vendor to get the whole set of solutions – if you’re looking for the hosted/managed path.
Other
Payments, user authentication and identity, the website itself and a large number of other things you might be needing.
These are really out of scope of WebRTC, but sometimes are provided by the various vendors and frameworks out there.
Back to that question
What were we dealing with to begin with here?
looking for a “WebRTC JS library” to use. Something that does 1:1 voice chat rooms, stores user profiles, etc. It also needed to be inexpensive – Twilio is too expensive for them. And a free alternative was their main preference.
Here’s how I’d break this one down to try and understand what was asked:
- That “WebRTC JS library” gives a hint of someone searching for a signaling framework. Which is great
- 1:1 voice chats strengthens that feeling we’re dealing with signaling only
- The word rooms… that feels more like an SFU media server. In this case, I’ll assume there’s no need for a media server though – due to the price points asked (free), the fact that there’s no ask on recording and that this is a 1:1 scenario
- Stores user profiles. Hmm. this usually has nothing to do with WebRTC. So much so that most CPaaS vendors don’t offer such a capability either
- Twilio is about the full shebang – getting a hosted, SaaS, CPaaS, managed (pick the term you like best) solution that gives you signaling, NAT traversal, media and some other knick knacks. Doesn’t quite fit in with the rest of the ask here
When I get such jumbled questions, it feels like there’s a bit of a misunderstanding of what WebRTC is and about how the ecosystem of vendors and services has evolved around it.
Want to learn more about WebRTC?
There are several things to do at this point if you need to grok WebRTC:
- Read this article on learning WebRTC for more suggestions
- Read my WebRTC for Business People report (it is free)
- Learn how I think about WebRTC requirements
- Take the first module of my WebRTC training (it’s free)
- Join me for the webinar tomorrow – I’ll talk about Mesh, MCU and SFU media architectures
So what did you answer?
Me? That I can’t help with a lot more information and context…
My answer for SIP is JsSIP, or my JsSIP wrapper webrtcdemo.audiocodes.com/sdk 😉
If you’re using SIP infrastructure, then sure (and I know you guys do at Audiocodes).
If you’re looking to make this a starting point for tackling the problem of what technology stack you need, then you’re doing it wrong.
JsSIP for me, not SIPjs. Proper open source!
I stubble on http://www.kurento.org while searching for Webrtc framework
Ochui – thanks for sharing.
I think this is where a lot of the confusion lies – Kurento is a media server framework. It does has some rudimentary signaling of its own, but I wouldn’t pick it for signaling.
What you need in your feature set greatly affect what frameworks and projects you should use.
I think you might have missed what they are looking for. They are looking for a solution that they can just drop into their application and that provides video or voice chat without them having to do any deep development work and without needing to understand WebRTC.
There’s two solutions: get a company like WebRTC Ventures ti develop and maintain that component for you – or use something like our Coviu API to embed video rooms into your app.
Coviu is an embedding based solution but unlike YouTube, all the hard server work associated with WebRTC is done for you. Yes, we’re a telehealth company, but our video interface can be used without medical tools, custom branded and embedded in other apps through an API.
Silvia,
Thanks – some people do want such a thing, but in many cases, they end up unhappy as they have some specific requirements/needs that aren’t met. There’s a lot of variety out there in what people want and mean when they say something like “WebRTC JS library”.
Guys,
I think you’re missing the point. There is a fundamental flaw in how most people look at WebRTC. Let’s start with a very basic premise – WebRTC is NOT a signalling technology, it is a media processing and direct media transmission technology. As such, you can use whatever signalling you want. As long as the SDP gets to your endpoint correctly, it will be processed.
Now, this created an amazing opportunity to the various vendors around the world. Each one created their own version of “WebRTC Signalling”, and they are not compatible with each-other (in 99% of cases). Even the various Cloud Providers aren’t compatible with one another – your want compatibility, make it yourself.
Now, the various JS libraries that utilize SIP Over WebSocket for WebRTC are AWESOME! Why is that? simple, they enable the bridging between the legacy and the new at a fair ease. For example, at Cloudonix we provide a set of wrappers that enable companies to integrate with our WebRTC (SIP over WebSocket) endpoints at ease. They are based on SIPml5, which we found to be both complete, easy to use and most importantly – was flexible enough to enable us to do what we needed. We had some “philosophical” issues with JsSIP and SIP.js, but again, that’s something totally different.
In addition, sorry to say, very much like Asterisk 10 years ago, WebRTC had become the “magical” solution to various communication problems – when it’s absolutely an incomplete solution.
I think the cost of Twilio and the various other providers is so expensive, it doesn’t make any sense. They mainly take advantage of the fact that building and maintaining a global WebRTC signalling and media processing infrastructure is complex and requires attention. We will be launching our WebRTC endpoint processing offering in a few days, so you can visit https://cloudonix.io to learn more about it – but I totally understand your confusion.
If you would ask me what would be my preference to building something like this, this would be my answer:
1. Kamailio or OpenSIPS for your WebSocket processing and Signalling
2. RTPengine for NAT Media Relaying
3. CoTurn for STUN and TURN processing
4. SIPml5 as the JS library
5. Asterisk for media application development (or freeswitch, depending on your preference)
The various “WebRTC native” servers, such as Kurento and EasyRTC provide a “signalling” layer, however, if you need to bridge back to the native PSTN world, you will need additional tools like Janus, which will add another layer of complexity.
Nir, thanks for sharing.
You might want to add your upcoming service to https://webrtcindex.com