When you build your next project with WebRTC – how should you select what signaling protocol to use?
In my monthly email from a few weeks ago, I gave a quick answer to this question. What are the alternative signaling protocols for WebRTC?
As I am currently looking closely at various API platforms for WebRTC, and dealing with that question myself with several clients, I decided it would be beneficial to share my answer here as well, in a bit of a longer form.
There are essentially 5 different options to choose from.
1. COMET / XHR / SSE
Consider this the classic approach to web signaling. If you don’t know enough about it, then read about it on Wikipedia. In essence, this is a hack that enables a web server to send messages to clients – something you need to be able to do when dealing with something like a session across two users/browsers that runs via a server.
These techniques are widely available on web browsers, which makes them commonplace and relatively easy to set up and use.
The only problem? Scaling them. Because they are hacks in nature, they tend to take up more resources on the server side, which means less browsers connected to a server, and that translates to the cost of operation.
On small scales, that might not be an issue, but if you plan on millions of users, you might want to think this one through.
UPDATE: As someone smart pointed out – on its own, this technique still require you to define your own proprietary signaling messages.
2. WebSocket
WebSockets are a relatively new addition to browsers. They enable opening up a session from the client to the web server, and then leaving it open for messages from both directions.
These messages can be textual or binary, they can be as rich as you wish, and they run really fast. You end up with nice scaling capabilities when using WebSockets. You can read more about it here.
The down side of WebSockets? They might not be available in the browser you plan on using (a non-issue for browsers supporting WebRTC, but may become an issue once you wrap WebRTC in a plugin for IE as an example). Oh, and not all web servers and proxies support them, so depending on your architecture and network deployment – you might not be able to even make use of WebSockets.
If you do plan on using WebSockets, I suggest you do two additional things:
- Run them over a secured TLS connection, which in general is what you should do for any WebRTC signaling anyway
- Think of using a hybrid solution like socket.io or SockJS, which can automatically “downgrade” to COMET mechanisms if WebSockets aren’t available
I’d also use WebSockets whenever. As in whenever I don’t feel that options 3-5 below make sense to me.
UPDATE: As someone smart pointed out – on its own, this technique still require you to define your own proprietary signaling messages.
 3. SIP over WebSocket
This is like WebSockets, only instead of placing inside proprietary messages, you end up putting SIP messages in there.
Ugly as hell, but gets the job done – especially if what you are looking for is connecting to an existing telephony backend. Who does this? Asterisk. Those that try to fuze WebRTC to IMS or RCS. People who need to “gateway” their way into SIP.
Unless you already have a SIP investment in place, and unless a major part of your use case includes calling to PSTN – don’t use this. Even if your origins are in VoIP and SIP is your mother tongue.
4. XMPP/Jingle
Similar to SIP, but this time using another standard signaling protocol called XMPP.
If you take this route, it is probably either because you have an existing XMPP installation or you need the presence capabilities that XMPP comes with out of the box (and with server side implementations readily available).
I am not a fan of XMPP to say the least, but I don’t really have anything bad to say about this approach. If you know and like XMPP – go for it.
5. Data Channel
WebRTC has a data channel. Once an initial connection is made between the two “endpoints”, you can use the data channel to communication and drive your signaling instead of going via a server.
There are few I’ve seen that use this approach, and it does have merit. If has 3 main benefits:
- Latency of signaling messages is lower, as there’s no server in-between that needs to parse and understand them
- Since a server isn’t involved, server scalability improves – it handles less messages from each connected browser
- Improved privacy, simply because tapping into the server gives you less information
UPDATE: As with Comet and WebSockets, you still need to define your messages when using the data channel.
Why is it important?
Selection of the signaling protocol will decide the development effort required for certain features as well as the cost you pay for it –in setup time of sessions, server performance, etc.
It is a decision that shouldn’t be taken lightly.
Tsahi, this is a nice balanced review of WebRTC signaling options. Scalability is indeed the primary drawback of do-it-yourself approaches regardless of where Comet, WebSockets, etc. are used.
For this reason, we’ve published the WebRTC SDK on GitHub back in June for app developers (http://www.pubnub.com/blog/webrtc-sdk-now-available-on-pubnub/) not only to provide a highly-scalable signaling solution but also address the equally-important reliability and QoS aspects.
The approach is similar to what we’ve done with socket.io which is a very popular WebSockets open-source library but does not scale by itself. Hence, we injected PubNub underneath in order to make it suitable for mass-scale commercial deployments.
Similarly, if you need to build a mass-scale WebRTC video chat messenger with presence: http://www.pubnub.com/blog/building-video-calling-with-pubnub-and-webrtc/
Doron,
Correct – I didn’t deal with who’s hosting the solution or managing it – just the technology. Now that you mention it, I need to add it to my posting schedule as another topic to touch when it comes to signaling.
Tsahi
This is a nice overview of the options for signaling in WebRTC projects. When I’m talking with our clients I usually bring the focus back to the use cases particularly around interoperability.
If your WebRTC app is going to be self contained as in you provide all of the control features in your application then you can choose based on pure technical merit. When there is need to interop with existing infrastructure then we normally are recommending the SIP/WebSocket to avoid most of the need to signaling gateway.
In reality you will still need to have some level of signaling gateway in place due to the nuances of sip implementations and especially if you are trying to interop with some less standard implementations like MS Lync.
This said I do encourage software companies to really think about what level of signaling support they want in their WebRTC app. Not every app is going to need all the features/functions that SIP offers and if you go that route you will invariably get drawn into extended testing cycles.
I’m with you here.
I was mulling over commenting here, because I’m not sure I really like being branded a fanatic. But that horse has probably taken a train, so it’s too late to bolt the station door. Or something.
So I thought I’d point out that “XMPP” is just an option for putting inside the Comet/Websocket/Data Channel. The bindings for each are well-defined (though to be fair supporting XMPP in SCTP is pretty experimental). For XMPP in a long polling session, there’s BOSH. There’s also more web-like gateway libraries such as xmpp-ftw.
A typical server (like open-source Prosody, or commercial Isode M-Link), ships with support for BOSH. As the XMPP/Websocket draft stabilizes, we’ll see that support for Websocket too (Prosody already supports Websocket). Data Channel won’t be very far behind; the work on Websocket support for XMPP is carefully considering SCTP as well.
XMPP based services have been proven at very high scalability levels, too. Google are not the only ones to have run multi-million client systems, though they’re the most well-known – and for VOIP, that’s still on XMPP (as are their cloud services, too). WhatsApp also use something that’s mostly XMPP. They deviated from the standard partly sensibly (they were in uncharted waters when they started) and stupidly (hence their security problems).
I’m thankfully not actually as religious as I may seem to be here – I’m just a bit confused when you say things like, “I am not a fan of XMPP to say the least, but I don’t really have anything bad to say about this approach.” – statements like that seem at odds with claiming to give a balanced view, and I feel like I have to balance things a bit.
I actually think, given the current deployed landscape, that if you want interop with SIP, then it should be easy on paper, but is likely harder because of trickle ICE, security model mismatch, and codecs. In addition, most SIP servers don’t speak any web binding (Asterisk is an outlier here), and as far as I know, none are standardized.
Leaving out federation for the moment, the question becomes on of least work, highest scaling, and minimum lock-in. That means you’ll be up and running faster, you’ll be able to keep to the same technology as you grow, and yet you’ll still be pretty adaptive. With XMPP you’ve a choice of three libraries and a slew of servers, all interchangeable and most of the servers scale impressively.
As I say, if you’ve a simple set of needs then the overhead of XMPP won’t be useful. XMPP provides you with a rich set of operations, and if you’re not going to benefit from them there’s little point. But that’s really the main argument against XMPP.
If your main reason for using XMPP is because you’ve a burning need to tell people you’ve an XML-based backbone, then you’re using XMPP for the wrong reasons, and furthermore, you’re probably a relic from the ’90’s.
On the other hand, if your application is going to need signalling, other peer-to-peer messaging, presence, persistent addressing, user authentication, security, reliable messaging, pub-sub, and/or federation, you’re going to save yourself a lot of effort by using XMPP.
I think we are in agreement.
The only place that we might defer is in how many of the use cases will end up needing that much or how many won’t need to divert in ways that will be hard to acheive with something like XMPP or SIP.
I’m not implying that you need to use *every* feature, you know. I’d have thought that just signalling, security, and reliability would be enough to make most people find it an attractive option. Basic signalling is relatively easy, but security and reliability turn out to be quite hard, and personally I prefer leaning on the work of real experts.
Now let’s see…
Just signaling – websites are doing this for years already. Security – ask Paypal and your bank – they do that over the web quite well. Reliability – got that covered.
Where does XMPP or SIP help me here? I’d use SIP only if I must, and XMPP if my use case relies heavily on presence (and the way presence is modeled by XMPP).
I have yet to see all those innovative usecases you talk about where Jingle or SIP are not capable of expressing the signalling semantics.
I hit (and report) bugs in chrome whenver I try something “innovative”, so the lack of bug reports suggests to me that nobody is innovating here.
I have yet to see a vendor that selected SIP or XMPP for his WebRTC service if he didn’t have an existing deployment with one of these protocols already – and even then, the decision isn’t always SIP or XMPP.
These services and their developers? They made that decision without consulting me.
Can you please tell me that how can i use xmpp/jingle for signaling server with my webrtc app in android.As webrtc works on p2p connection.
Look at SimpleWebRTC framework – I believe they make use of XMPP.
That is so good explanation, thank you so much
Thank you for the kind words 🙂
I’m a bit confused by your listing of the WebRTC dataChannel as a valid option for signalling. Since you first have to have an established connection between the peers in order to exchange data via. the dataChannel (which, obviously, requires signalling to establish that connection) – how can the dataChannel be really used for signalling?
Nick,
Using the data channel for signaling means building a distributed layer where browsers communicate among themselves assisting each other to find their destination.
You can use a signaling server using WebSocket to connect to this distributed layer and continue all negotiations from there. I’ve seen this discussed more than once in the past.
There’s also the part of having the initial connection handled by a signaling server but any further communication go over the data channel.
How do you feel about this two years later? Would you add anything new? Would you take anything off the table?
Andrew,
That’s a good question. I now tend to split it into two parts, based on something Justin said about this some time ago – I have mixed here both Signaling and Transport protocols.
On the Transport side, it is either HTTPS, HTTP/2 or Websocket now (where HTTPS further splits off into COMET types)
On the Signaling side, it is either SIP, XMPP or proprietary
I’ll be writing something about it soon – just because you asked this question 🙂
Hello Mr Levi.
How is it going with you ?
I have a problem and need immediate help .
I have a MVC website about educating .there are students and advisers In this Website.
I want them to be able to communicate with each other with video conference in WEBRTC .
how can I do that ?
Lio,
If you have no development experience, I suggest you hire a team that does.
Check the WebRTC Index for some suggestions.
If you need consulting on who and how to build it, then this is what I do. Alternatively, you can check out my WebRTC API platforms report (requires payment). It has the various alternatives for both development tracks and vendor selection.
In most cases a SIP signalling gateway will not be enough since WebRTC implements DLTS based SRTP for media encryption and this is now commonly supported in legacy SIP systems. You need consider a signalling gateway with media proxy functions like TekSIP.
Hello ,
very nice article.I have a question that, Is there any signalling server that i can use to connect only two android mobile phones by wifi (No internet or router included) .Actually i have implemented webrtc for android web to web video broadcast . now working on signalling server.
Muhammad,
There are several alternatives out there. You can use Pusher, PubNub and even Firebase to do this. You can install your own SimpleWebRTC or AppRTC signaling. You can use Matrix.org. …
Is it mendatory to send sip over websocket through TLS for webrtc signalling part. Is it mendatory to send sip signalling over websocket ? is there any other options as well
Web browsers don’t allow signaling for WebRTC to run over non-secure transports. That means you can only use HTTPS or secure websocket. So… if you want to use SIP as your signaling towards the browser, you’ll need to do it as SIP over a secure websocket.
The other option is to translate SIP to something else in a gateway and then use HTTPS or secure websocket.