How to Select a Signaling Protocol for Your Next WebRTC Project?

February 17, 2014
When you build your next project with WebRTC – how should you select what signaling protocol to use? In my monthly email from a few weeks ago, I gave a quick answer to this question. What are the alternative signaling protocols for WebRTC? As I am currently looking closely at various API platforms for WebRTC, and dealing with that question myself with several clients, I decided it would be beneficial to share my answer here as well, in a bit of a longer form. There are essentially 5 different options to choose from.

1. COMET / XHR / SSE

Consider this the classic approach to web signaling. If you don't know enough about it, then read about it on Wikipedia. In essence, this is a hack that enables a web server to send messages to clients – something you need to be able to do when dealing with something like a session across two users/browsers that runs via a server. These techniques are widely available on web browsers, which makes them commonplace and relatively easy to set up and use. The only problem? Scaling them. Because they are hacks in nature, they tend to take up more resources on the server side, which means less browsers connected to a server, and that translates to the cost of operation. On small scales, that might not be an issue, but if you plan on millions of users, you might want to think this one through. UPDATE: As someone smart pointed out - on its own, this technique still require you to define your own proprietary signaling messages.

2. WebSocket

WebSockets are a relatively new addition to browsers. They enable opening up a session from the client to the web server, and then leaving it open for messages from both directions. These messages can be textual or binary, they can be as rich as you wish, and they run really fast. You end up with nice scaling capabilities when using WebSockets. You can read more about it here. The down side of WebSockets? They might not be available in the browser you plan on using (a non-issue for browsers supporting WebRTC, but may become an issue once you wrap WebRTC in a plugin for IE as an example). Oh, and not all web servers and proxies support them, so depending on your architecture and network deployment – you might not be able to even make use of WebSockets. If you do plan on using WebSockets, I suggest you do two additional things:
  1. Run them over a secured TLS connection, which in general is what you should do for any WebRTC signaling anyway
  2. Think of using a hybrid solution like socket.io or SockJS, which can automatically "downgrade" to COMET mechanisms if WebSockets aren't available
I'd also use WebSockets whenever. As in whenever I don't feel that options 3-5 below make sense to me. UPDATE: As someone smart pointed out - on its own, this technique still require you to define your own proprietary signaling messages.

 3. SIP over WebSocket

This is like WebSockets, only instead of placing inside proprietary messages, you end up putting SIP messages in there. Ugly as hell, but gets the job done – especially if what you are looking for is connecting to an existing telephony backend. Who does this? Asterisk. Those that try to fuze WebRTC to IMS or RCS. People who need to "gateway" their way into SIP. Unless you already have a SIP investment in place, and unless a major part of your use case includes calling to PSTN – don’t use this. Even if your origins are in VoIP and SIP is your mother tongue.

4. XMPP/Jingle

Similar to SIP, but this time using another standard signaling protocol called XMPP. If you take this route, it is probably either because you have an existing XMPP installation or you need the presence capabilities that XMPP comes with out of the box (and with server side implementations readily available). I am not a fan of XMPP to say the least, but I don't really have anything bad to say about this approach. If you know and like XMPP – go for it.

5. Data Channel

WebRTC has a data channel. Once an initial connection is made between the two "endpoints", you can use the data channel to communication and drive your signaling instead of going via a server. There are few I've seen that use this approach, and it does have merit. If has 3 main benefits:
  1. Latency of signaling messages is lower, as there's no server in-between that needs to parse and understand them
  2. Since a server isn't involved, server scalability improves – it handles less messages from each connected browser
  3. Improved privacy, simply because tapping into the server gives you less information
UPDATE: As with Comet and WebSockets, you still need to define your messages when using the data channel.  

Why is it important?

Selection of the signaling protocol will decide the development effort required for certain features as well as the cost you pay for it –in setup time of sessions, server performance, etc. It is a decision that shouldn't be taken lightly.

You may also like