OpenAI, LLMs, WebRTC, voice bots and Programmable Video
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreWebRTC vs WebSockets: They. Are. Not. The. Same.
Sometimes, there are things that seem obvious once you’re “in the know” but just isn’t that when you’re new to the topic. It seems that the difference between WebRTC vs WebSockets is one such thing. Philipp Hancke pinged me the other day, asking if I have an article about WebRTC vs WebSockets, and I didn’t - it made no sense for me. That at least, until I asked Google about it:
It seems like Google believes the most pressing (and popular) search for comparisons of WebRTC is between WebRTC and WebSockets. I should probably also write about them other comparisons there, but for now, let's focus on that first one.
Need to learn WebRTC? Check out my online course - the first module is free.
WebSockets are a bidirectional mechanism for browser communication.
There are two types of transport channels for communication in browsers: HTTP and WebSockets.
HTTP is what gets used to fetch web pages, images, stylesheets and javascript files as well as other resources. In essence, HTTP is a client-server protocol, where the browser is the client and the web server is the server:
My WebRTC course covers this in detail, but suffice to say here that with HTTP, your browser connects to a web server and requests *something* of it. The server then sends a response to that request and that’s the end of it.
The challenge starts when you want to send an unsolicited message from the server to the client. You can’t do it if you don’t send a request from the web browser to the web server, and while you can use different schemes such as XHR and SSE to do that, they end up feeling like hacks or workarounds more than solutions.
Enter WebSockets, what’s meant to solve exactly that - the web browser connects to the web server by establishing a WebSocket connection. Over that connection, both the browser and the server can send each other unsolicited messages.
Because WebSockets are built-for-purpose and not the alternative XHR/SSE hacks, WebSockets perform better both in terms of speed and resources it eats up on both browsers and servers.
WebSockets are rather simple to use as a web developer - you’ve got a straightforward WebSocket API for them, which are nicely illustrated by HPBN:
var ws = new WebSocket('wss://example.com/socket');
ws.onerror = function (error) { ... }
ws. = function () { ... }
ws.onopen = function () {
ws.send("Connection established. Hello server!");
}
ws.onmessage = function(msg) {
if(msg.data instanceof Blob) {
processBlob(msg.data);
} else {
processText(msg.data);
}
}
You’ve got calls for send and close and callbacks for onopen, onerror, and onmessage. Of course there’s more to it than that, but this is holds the essence of WebSockets.
It leads us to what we usually use WebSockets for, and I’d like to explain it this time not by actual scenarios and use cases but rather by the keywords I’ve seen associated with WebSockets:
Funnily, a lot of this sometimes get associated with WebRTC as well, which might be the cause of the comparison that is made between the two.
There are numerous articles here about WebRTC, including a What is WebRTC one.
In the context of WebRTC vs WebSockets, WebRTC enables sending arbitrary data across browsers without the need to relay that data through a server (most of the time). That data can be voice, video or just data.
Here’s where things get interesting -
When starting a WebRTC session, you need to negotiate the capabilities for the session and the connection itself. That is done out of the scope of WebRTC, in whatever means you deem fit. And in a browser, this can either be HTTP or… WebSocket.
So from this point of view, WebSocket isn’t a replacement to WebRTC but rather complementary - as an enabler.
Sort of.
I’ll start with an example. If you want you connect to a cloud based speech to text API and you happen to use IBM Watson, then you can use its WebSocket interface. The first sentence in the first paragraph of the documentation?
The WebSocket interface of the Speech to Text service is the most natural way for a client to interact with the service.
So. you stream the speech (=voice) over a WebSocket to connect it to the cloud API service.
That said, it is highly unlikely to be used for anything else.
In most cases, real time media will get sent over WebRTC or other protocols such as RTSP, RTMP, HLS, etc.
WebRTC has a data channel. It has many different uses. In some cases, it is used in place of using a kind of a WebSocket connection:
The illustration above shows how a message would pass from one browser to another over a WebSocket versus doing the same over a WebRTC data channel. Each has its advantages and challenges.
Funnily, the data channel in WebRTC shares a similar set of APIs to the WebSocket ones:
const peerConnection = new RTCPeerConnection();
const dataChannel =
peerConnection.createDataChannel("myLabel", dataChannelOptions);
dataChannel.onerror = (error) => { … };
dataChannel. = () => { … };
dataChannel.onopen = () => {
dataChannel.send("Hello World!");
};
dataChannel.onmessage = (event) => { … };
Again, we’ve got calls for send and close and callbacks for onopen, onerror, and onmessage.
This makes an awful lot of sense but can be confusing a bit.
There this one tiny detail - to get the data channel working, you first need to negotiate the connection. And that you do either with HTTP or with a WebSocket.
Almost never. That’s the truth.
If you’re contemplating between the two and you don’t know a lot about WebRTC, then you’re probably in need of WebSockets, or will be better off using WebSockets.
I’d think of data channels either when there are things you want to pass directly across browsers without any server intervention in the message itself (and these use cases are quite scarce), or you are in need of a low latency messaging solution across browsers where a relay via a WebSocket will be too time consuming.
While both are part of the HTML5 specification, WebSockets are meant to enable bidirectional communication between a browser and a web server and WebRTC is meant to offer real time communication between browsers (predominantly voice and video communications).
There are a few areas where WebRTC can be said to replace WebSockets, but these aren't too common.
Yes and no.
WebRTC doesn't use WebSockets. It has its own set of protocols including SRTP, TURN, STUN, DTLS, SCTP, ...
The thing is that WebRTC has no signaling of its own and this is necessary in order to open a WebRTC peer connection. This is achieved by using other transport protocols such as HTTPS or secure WebSockets. In that regard, WebSockets are widely used in WebRTC applications.
No.
To connect a WebRTC data channel you first need to signal the connection between the two browsers. To do that, you need them to communicate through a web server in some way. This is achieved by using a secure WebSocket or HTTPS. So WebRTC can't really replace WebSockets.
Now, once the connection is established between the two peers over WebRTC, you can start sending your messages directly over the WebRTC data channel instead of routing these messages through a server. In a way, this replaces the need for WebSockets at this stage of the communications. It enables lower latency and higher privacy since the web server is no longer involved in the communication.
Need to learn WebRTC? Check out my online course - the first module is free.
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreGet your copy of my ebook on the top 7 video quality metrics and KPIs in WebRTC (below).
Read More