Market Archives • BlogGeek.me https://bloggeek.me/category/market/ The leading authority on WebRTC Wed, 01 May 2024 09:31:07 +0000 en-US hourly 1 https://bloggeek.me/wp-content/uploads/2021/06/ficon.png Market Archives • BlogGeek.me https://bloggeek.me/category/market/ 32 32 Does WebRTC need a change in governance? https://bloggeek.me/webrtc-need-change-governance/ https://bloggeek.me/webrtc-need-change-governance/#respond Mon, 29 Apr 2024 09:30:00 +0000 https://bloggeek.me/?p=74255 Is it time to change the governance of WebRTC in order to keep it growing and flourishing?

The post Does WebRTC need a change in governance? appeared first on BlogGeek.me.

]]>
Is it time to change the governance of WebRTC in order to keep it growing and flourishing?

WebRTC started life in 2011 or 2012. Depending when you start counting.

That’s around 13 years now. Time to put things on the table – we might need a change in governance. A different way of thinking about WebRTC.

The concept of WebRTC unbundling

I published the above on LinkedIn last month.

It was a culmination of thoughts I’ve been having for the past several years.

You can pinpoint the first time I made that distinction in 2020 while coining the term WebRTC unbundling.

The notion was that WebRTC is being broken down into smaller pieces and developers are given more leeway and control over what WebRTC does (=a good thing). The result of all this is the ability to differentiate further, but also that the baseline of what WebRTC is gets farther behind what good media quality means.

There’s the popular open source implementation for WebRTC known as libwebrtc. It is maintained and governed by Google. When Google can enact its strategy by implementing their technologies and IP outside and around libwebrtc instead of inside libwebrtc – why wouldn’t they?

Google runs a business. They have commercial objectives. Differentiating from competitors who use libwebrtc to outwin Google would be a poor decision to make. Giving competitors using proprietary technology the source code of libwebrtc to copy from and improve upon without contributing back isn’t a smart move either.

This means cutting edge technologies and research is now done mostly outside of libwebrtc (and WebRTC) as much as possible. And the unbundling of WebRTC that started some 4 years ago is now starting to show.

Before we dive into the details

Something I always explain to people new to WebRTC is that WebRTC isn’t a single thing. When someone refers to it, he either thinks of WebRTC as a standard or WebRTC as an open source project:

The above is one of the first slides I’ve ever created about WebRTC.

WebRTC is an open standard. It is being specified by the IETF and W3C. The IETF deals with the network side while the W3C is all about the browser interface (JavaScript APIs).

WebRTC is also viewed as an open source project. That’s actually libwebrtc… the most common and popular implementation of WebRTC which has been created and is maintained by Google.

So remember – when people say WebRTC they can refer to it as either a standard or a package or both at the same time.

What we will do in this article from here on, is jump between these two definitions and see where we are with them today. We will start with the libwebrtc open source library.

The power and importance of libwebrtc

Here’s what I shared in my RTC@Scale 2024 session:

In WebRTC, libwebrtc is the most important library. There are others, but this is by far the most important. Why?

  • It is integrated and used by ALL modern browsers (Chrome, Edge, Firefox and Safari)
  • So when you interact with any browser in your WebRTC application, you end up working against libwebrtc
  • Many mobile applications decided to use libwebrtc natively inside the app. Why? Because it is good enough

The end result is that… well… It is the most important WebRTC library out there.

Before libwebrtc, what we had was lame open source libraries that implemented media engines. All good options were commercial ones. In fact, libwebrtc (and WebRTC) started with Google acquiring a company called GIPS who had a great implementation of a commercial media engine that they licensed to companies. I know because the company I worked at licensed it, and the moment they got acquired, we got a flood of requests and questions about finding an alternative.

What WebRTC did was make media engines a commodity of sorts. A new era where high quality media can be had from open source. This also meant that the commercial media engine market died at the same time.

This new development of pushing innovations and improvements in the media engine pipeline outside of libwebrtc is what is going to take that advantage from open source and libwebrtc away.

More on that, a bit later. But next, why don’t we look at the standardization of WebRTC?

WebRTC standardization efforts

The standardization of WebRTC was split between two different organizations: the W3C and the IETF. They were always semi-aligned.

The IETF was in charge of what goes on in the network. How a WebRTC session looks like on the wire. For WebRTC, it uses stuff that we all considered quite modern in 2012 – light years in tech-time. The IETF Working Group working on WebRTC, RTCWEB, concluded its work and closed down.

The W3C was/is in charge of the API layer in the browser. The JavaScript interface, mostly revolving around the RTCPeerConnection. And yes, they are trying to wrap this one up and call it a day.

In many ways, what brought WebRTC to what it is today is the W3C – the part focused on the interface in the browser that developers use. That is because the browser is our window to the internet (and in many ways to the world as well). And this window includes the ability to use WebRTC through the APIs specified by the W3C.

The catch here is that the standardization done by the W3C for WebRTC consists almost solely by the browser vendors themselves. There aren’t any (or not enough) web developers sitting at the table. The ones who need and end up using the WebRTC APIs have no real voice in the WebRTC spec itself. The cooks in the kitchen are far remote from the restaurant diners who need to enjoy their dish.

And meanwhile, the cooks have different opinions and directions as well:

  • Chrome protects its interests, focusing mainly on Google Meet’s requirements. This is what drives many of the contributions Google has been making to the W3C on the spec
  • The rest? Mostly trying to block any forward movement so they won’t have to add changes to their own browser implementation. This is especially true for Safari and Firefox

So what do we end up with?

Google, trying to add things it needs to the WebRTC specification to solve their product needs

Other browser vendors, trying to delay Google a bit..

And developers who aren’t part of the game at all and are happy with the leftovers from what Google needs.

Vendors differentiating outside of (lib)WebRTC

The whole WebRTC ecosystem is enjoying the work of Google in libWebRTC. They do so in various ways:

  1. Directly by taking libWebRTC codebase, making it their own and compiling it into native applications
  2. Indirectly by having WebRTC run inside web browsers, and figuring out any bugs and issues they bump into
  3. By carving bits and pieces of it to use in their own app (like tearing the echo canceller or other algorithms from libWebRTC and using it elsewhere)

The first alternative is the most interesting one here.

When vendors do that, they usually end up forking the original codebase and modifying bits and pieces of it to fit their own needs. These might be minor bug fixes for edge cases or they may be full blown optimizations (like what Meta has done with their new MLow codec and Beryl echo cancellation algorithm – there were other areas as well. You’ll find them in the RTC@Scale event summary).

Video API vendors are no different. They usually take libWebRTC and compile it as part of their own mobile SDKs. Again, with likely changes in the code. They also get to see and work with a multitude of customers, each with its own unique requirements. In a way,they see a LOT of the market. Having these insights and understanding is great. Passing it to the libWebRTC team can be even better. These Video API vendors can be a great aggregator of customer insights…

Then there’s the fact that not many end up contributing back what they’ve done to libWebRTC. And even that comes with a whole set of reasons why:

  1. Assuming (rightly or wrongly) that these changes made are unique, proprietary, a competitive advantage – you name it
  2. Being afraid of the legal implications of doing so (exposure or whatever)
  3. Too much fuss to do

If you ask me, (1) is just bad manners – you get something for free from another vendor you might even be competing directly with. The least you can do is to share and contribute back, so that you have a level playing field at that low level of the stack.

Looking at (2) means someone needs to sit and talk to the legal team at your company. On one hand, you make use of open source and on the other you’re not giving back anything. I am not even sure if that reduces your exposure in any way. I am not a lawyer, but I do see the problem in this free lunch approach of the industry.

That third one is a big issue. And partly due to the fault of Google. They don’t make it easy enough to contribute back to the codebase. I can easily understand the reasoning – with billions of Chrome installations, having a no-name developer with a weird github alias from *somewhere* in the globe trying to push a piece of arcane/mundane code into libWebRTC that ends up in Chrome is darn dangerous. But the current situation seems almost insufferable.

I just don’t know who’s to blame here – companies who are just too lazy to contribute back and take the hoops required to get there or Google, for adding more blockers and hoops along their way.

Is standardization moving to the next shiny thing(s)?

There are two separate routes in web browsers that are setting up themselves to displace WebRTC: WebTransport + WebCodecs + WebAssembly & MoQ (Media over QUIC)

WebTransport + WebCodecs + WebAssembly

This trio is the unbundling of WebRTC. Taking it and breaking it into smaller components that cannot really be implemented in a web browser – these are WebTransport and WebCodecs. And adding the glue to them so that developers can cobble up the missing pieces however they feel like it – that’s the WebAssembly piece.

Vendors are already using WebAssembly to intervene with the WebRTC media processing pipeline to differentiate and improve on the user experience in various ways (noise suppression and background replacement being the main examples).

The next step is to skip WebRTC altogether:

  • Use WebTransport for sending media over the network
  • WebCodecs are there to encode and decode audio and video efficiently
  • WebAssembly for the rest (packet loss, retransmission logic, echo cancellation, etc)

Don’t believe me? Zoom is doing almost that. They are using the WebRTC data channel as transport, and use WebCodecs and WebAssembly for the rest of it. Switching to WebTransport will likely happen for Zoom once it is ubiquitous across browsers (and offers solid performance compared to the data channel in WebRTC).

A new shiny toy for developers? Definitely.

Where will we see it first? In live streaming. I’ve written about it when discussing WHIP and WHEP, calling it the 3 horsemen.

MoQ (Media over QUIC)

The next big thing is likely to be MoQ.

WebTransport makes use of QUIC as its own transport. Around 5 years ago, I thought that QUIC can be a really good solution to replace WebRTC’s transport altogether. And it now has an official name – MoQ.

MoQ is about doing to RTP what WebTransport does to HTTP.

WebTransport takes QUIC and uses it as a modernized transport for web browsers, replacing HTTP and WebSocket.

MoQ takes QUIC and uses it as modernized media streaming for web browsers, replacing HLS and DASH.

There’s an overview for MoQ on the IETF website. Here’s the best part of it, directly from this post:

It includes a single protocol for sending and receiving high-quality media (including audio, video, and timed metadata, such as closed captions and cue points) in a way that provides ultra low latency for the end user.

If that sounds like WebRTC to you, then you’re almost correct. It is why many are going to see it (and use it) as a WebRTC alternative once it gets standardized and implemented by web browsers.

The main differences?

  • The timed metadata piece, which WebRTC sourly missed for many years
  • No P2P capability. Sacrificed for improved NAT traversal (by relying on QUIC and servers)
  • The definition of media relays (servers) along with their operation

While this is targeted at live streaming services, this can easily trickle into video conferencing.

Just like WebRTC was designed and built for video conferencing, but later adopted by live streaming services – the opposite can and is likely to happen: MoQ is being designed and built first and foremost for live streaming and it will be adopted and used by video conferencing services as well.

Would Google be interested in WebRTC enough? Maybe it would venture to use WebTransport + WebCodecs + WebAssembly instead. Or just go for MoQ and consolidate its protocols across services (think YouTube + Google Meet). What would happen to WebRTC if that would take place?

Who contributes to libwebrtc?

Here’s what I showed at RTC@Scale:

Let’s unpack this a bit.

The bars show the number of commits on a yearly basis. We see the numbers dwindling and winding down just as the use of WebRTC skyrockets (the redline) due to the pandemic. 2024 is likely to be even lower in terms of commits.

The greenish colored bars are Google’s contributions to libwebrtc. The blue? All the rest of the industry who make money using WebRTC – not all of them mind you – just those that contribute back (there are many others who never contribute back). Google has been sponsoring them somewhat which can not make them happy.

Why is that?

Why are so few contributions outside of Google end up in libwebrtc?

I guess there are two reasons here:

  1. Google doesn’t make it easy to contribute. In the end, libwebrtc gets embedded into Chrome which goes to billions of users every month with a new release. Not knowing what got integrated (malware or patent-encumbered code for example) is a real issue. Having insecure or not thoroughly tested code is also unacceptable at this scale
  2. Laziness of those who use libwebrtc but never contribute back
    • In large corporations, the developers need to “fight” with the legal teams to contribute code back (the excuses are usually around liability and protecting IP)
    • Smaller companies can’t be bothered with the friction that Google adds to the process – or just don’t want to spend the needed time
    • Not wanting to make your competitor’s product better by contributing
    • Struggling with the server side parts of WebRTC that in the end are quite tightly coupled with libWebRTC on the client. Google Meet undoubtedly delivers the best experience because the client side is designed for its needs

Many developers the world over enjoy the fruits of libwebrtc, but most aren’t willing to contribute back. This is true for both individual engineers as well as companies. Google even gave up on being frustrated with this and resorts to solving their own issues these days. They probably have a very good understanding of the overall usage in Chrome where Google Meet remains the dominant user.

On the one hand, Google isn’t making this easy. On the other hand, companies are lazy or protective of their own forked libwebrtc code to never contribute it back.

Can we save libwebrtc & WebRTC?

It is time to rethink WebRTC’s future.

For libwebrtc, we might need some other form of governance. Have more of the bigger vendors pitch in with the engineering effort itself. Meta, Microsoft and a few others who rely heavily on libwebrtc need to step up to that responsibility (the W3C Working Group is not where this kind of discussion happens) while Google needs to let go a bit. I have no clue how things are done in the world of Linux and I am sure libwebrtc isn’t big enough or important enough for that. But I do believe that something can be done here. At the end of the day it will require taking some of the maintenance cost off Google.

Just like Chrome has third party libraries such as libopus and dav1d (AV1 decoder) embedded into Chrome as part of libwebrtc, there is no real reason why libwebrtc itself can’t end up in the same way.

For WebRTC standardization, it is time to ask – is it finished, or are there more things needed?

Do we want to progress and modernize it further or are we happy with it as is?

Should we “migrate” it towards MoQ or a similar approach?

In the W3C, do we need to get more people involved? The web developers themselves maybe? They need to be listened to and made part of the process.

Will the above happen? Likely not.

The post Does WebRTC need a change in governance? appeared first on BlogGeek.me.

]]>
https://bloggeek.me/webrtc-need-change-governance/feed/ 0
An FAQ for WebRTC beginners https://bloggeek.me/faq-webrtc-beginners/ https://bloggeek.me/faq-webrtc-beginners/#respond Mon, 29 Jan 2024 10:30:00 +0000 https://bloggeek.me/?p=74179 Answering some common FAQ questions about WebRTC that seem to be top of mind on Google search.

The post An FAQ for WebRTC beginners appeared first on BlogGeek.me.

]]>
Answering some common FAQ questions about WebRTC that seem to be top of mind on Google search.

A few days ago, I searched something on Google, and somehow bumped into a page full of questions Google found relevant or common. These weren’t exactly relevant to my search term (not directly), but they were there. And they were beginner questions about WebRTC.

It dawned on me that I’ve probably mentioned some of these things in passing (or a wee bit more) in the past, but placing them all neatly together in one place made sense. So here we are. And here’s the WebRTC FAQ for beginners.

Is WebRTC TCP or UDP?

WebRTC is neither TCP nor UDP. At the same time WebRTC is both TCP and UDP.

Confused?

Let’s put things in order.

With WebRTC there’s signaling and media.

Signaling is considered to be out of scope and left to the application. Most applications will use HTTPS or a secure WebSocket as transport for signaling. HTTPS runs over TCP… sort of… since HTTP/3 can also do UDP. But mostly, you can think of signaling in WebRTC as TCP and the skies won’t fall (👉 what we want for signaling is reliability and messages order, and TCP based protocols give us that).

Media in WebRTC wants to use UDP. It strives to use UDP as much as possible, but that’s not always available to it, so it then falls back towards using TCP. But you can consider this as a last resort (we don’t want to be in that predicament).

Read more about WebRTC transport:

Is WebRTC still used?

Yes. You wouldn’t be reading my blog otherwise 😎

It isn’t that there aren’t any challengers. It is that WebRTC is still the most popular and common solution for real time communications in web browsers.

WebTransport + WebCodecs + WebAssembly might someday replace WebRTC. But we’re not there yet.

Read more about WebRTC’s success and future:

Is WebRTC free or paid?

Free. Err. Paid. Free? Paid? Both? None?

Let’s sort things out here.

WebRTC is an open standard with a popular open source implementation maintained by Google and used by all major browser vendors.

Accessing the APIs and using them is free.

But creating most of the meaningful applications is going to require some sort of payment. That can be to a CPaaS vendor to host the WebRTC infrastructure; or to an IaaS vendor (think AWS) to host the servers and the bandwidth use (especially with TURN and media servers).

So yes. WebRTC is free, but expect to pay for it, in particular if you need help. Google will not help you…

Read more about WebRTC’s costs:

What is WebRTC used for?

WebRTC is used for implementing realtime voice and video communications over the internet using web browsers. But it definitely isn’t limited to that.

I’ve seen use cases dealing with recording, live streaming, broadcasting, cloud gaming, remote teleoperation (that’s driving a car… remotely), peer assisted delivery, file transfer, … the list is endless.

Read more about WebRTC use cases:

Is WebRTC a security risk?

WebRTC enables browsers to have (and give) access to your microphone, camera, display and IP address. This is what every voice or video meeting application you install requires in order to work properly as well.

Is that a security risk? That’s up to you to decide as a user.

Giving such power to the browser reduces the friction for users but also for nefarious third parties who want to exploit these capabilities, so some will see this as an increase in security risk.

For developers it simply means that they need to know and understand what they are doing and how they implement their applications with this technology in order to mitigate any potential risk. It is worth noting that WebRTC and web browsers from their side do the most they can to reduce such security risks and even encourage developers to write secure applications.

Read more about WebRTC security:

Does Netflix use WebRTC?

No.

Netflix might be using WebRTC somewhere, but for its main video streaming service Netflix doesn’t use WebRTC.

Why? Because WebRTC is designed and fine tuned for real time communications. As such, it sacrifices quality for improved latency.

Netflix is the exact opposite. It strives to deliver the best quality and is willing to sacrifice a bit of latency while at it – you wouldn’t mind waiting a few seconds for your movie to start in order to have crisp and pristine video. On the other hand, you’d be pissed if your online video conversation had a latency of 5 seconds and felt as if the other person was sitting on the moon.

Read more about WebRTC and latency:

Can WebRTC be hacked?

Yes.

Everything can be hacked.

Browsers are trying to do their best to reduce that risk for WebRTC (and other technologies they implement), but it is an arms race…

Read more about WebRTC security:

Does WebRTC expose your IP?

This is a tricky question. The answer is yes and no.

Let’s start by understanding which IP address…

Your device usually has two IP addresses:

  1. A local IP address, used inside its local network – say the home network
  2. A public IP address, which the NAT assigns to it and is used to communicate with “the world”

Each application on your device, including the browser, has access to the local IP address.

Each web server you connect to on the internet sees your public IP address.

When negotiating a WebRTC session, WebRTC uses a mechanism called ICE which discovers your public IP address and shares your local and public IP address with the peer it connects with.

A few quick clarifications here:

  1. WebRTC will not expose a local IP address without permissions to access a camera or a microphone
  2. Any voice or video communication applications ends up exposing the same addresses in similar fashion
  3. A WebRTC application can decide to use only TURN relay or media servers so as to not expose these IP addresses to other users
  4. There are browser extensions that can be used that limits the ability to expose local IP addresses
  5. If your VPN leaks your public IP with WebRTC it is that VPN which is not working

More about WebRTC IP leak:

What is better than WebRTC?

A cheesecake from Philipp Hancke for my 10-years BlogGeek.me birthday

A cheesecake is definitely better than WebRTC. A chocolate cheesecake is doubly so.

In all seriousness though, I have no clue.

It depends. Which is a cop out answer but the only one here.

The question should be more specific. It should include what it is you are trying to build, what is the target audience and what medium do you want to use for it.

For live streaming, WebRTC might not be the best fit. Especially if you can live with a 2 seconds delay (in that case, LL-HLS and LL-DASH would be better solutions for example).

For video conferencing… well… I’d start by selecting WebRTC by default. And then try to poke holes in my decision and select something else – proprietary – since there is nothing else…

More about WebRTC alternatives:

Is WebRTC better than Websockets?

Apples to oranges.

I’d use both. In the same application. Seriously.

WebSocket for signaling and WebRTC for media.

There are two places where you can think of WebRTC and WebSocket as alternatives:

  1. WebRTC’s data channel, which is bidirectional in nature and peer-to-peer. For the most part, I’ll still use WebSocket. Unless I am serious about my low latency requirements or my privacy requirements
  2. When aiming for live streaming. But then I might just go for WebTransport instead of WebSocket – being forward thinking…

Did I already say apples to oranges?

More about transport in WebRTC:

Is Google a WebRTC?

To be frank – Google is Google. Not sure what the question is here 🤣

Google and WebRTC have an interesting relationship.

It all started when Google acquired GIPS, a company who licensed media engines. A bit afterward, WebRTC was announced in the standardization organizations and Google made the GIPS media engine into an open source implementation, integrating it into Chrome and placing APIs on top of it – these APIs were the WebRTC API specifications (or close enough at the time).

That was over 10 years ago. Since then, WebRTC has evolved and so has Google’s implementation of it.

Google uses WebRTC internally for Google Meet and for other products and projects it has.

The actual WebRTC project is open source. Maintained by Google. And most of the contributions to it are Google’s.

More about WebRTC & Google:

Does WebRTC need a server?

Yes. WebRTC needs a server. In fact, it needs multiple servers.

For starters, you need to download the application logic from somewhere, and a way to signal who you want to make a conversation with. This is done with a signaling server.

Then, when connecting the WebRTC session, there are times when you won’t have a direct route for the media. In such cases, you are going to need a TURN server. TURN servers also act as STUN servers but STUN servers are not the same as signaling servers.

And, you may want to go fancy – run a group meeting, record stuff. Such capabilities almost always mean you are adding a media server into the mix.

Read more about WebRTC servers:

Does WebRTC require Internet?

Yes.

Everything today requires the Internet. Even you being able to read this FAQ requires the Internet.

WebRTC can run in local networks or private networks without connecting to the public Internet. But it still needs an IP network to work.

Does WebRTC use SSL?

Yes.

Let’s start with definitions first: For me SSL and TLS are one and the same.

HTTPS and WSS (Secure HTTP and Secure WebSocket) both run on top of TLS so they are also → SSL.

Web browsers practically force application developers to use HTTPS for the pages that host these services, which means all signaling used with WebRTC will be done via HTTPS or WSS.

The media uses SRTP, which is Secure RTP, which doesn’t use TLS (because it isn’t running over TCP). That said, when sessions need to be relayed via TURN servers, they might end up being relayed over TURN/TLS.

Read more about WebRTC security:

Where’s the answer to my question?

Couldn’t find the answer?

I can invite you to follow and read my blog – it has a lot of resources about WebRTC

My suggestion? Start here 👉 What is WebRTC?

If you are looking to skill up with WebRTC, I also have WebRTC courses for you.

The post An FAQ for WebRTC beginners appeared first on BlogGeek.me.

]]>
https://bloggeek.me/faq-webrtc-beginners/feed/ 0
My WebRTC predictions for 2024 https://bloggeek.me/webrtc-predictions-2024/ https://bloggeek.me/webrtc-predictions-2024/#respond Mon, 15 Jan 2024 10:30:00 +0000 https://bloggeek.me/?p=74086 Here are the WebRTC trends and predictions you should expect in 2024. They are a continuation of what we’ve seen in 2023 with a few variations.

The post My WebRTC predictions for 2024 appeared first on BlogGeek.me.

]]>
Here are the WebRTC trends and predictions you should expect in 2024. They are a continuation of what we’ve seen in 2023 with a few variations.

webrtc outlook

Time to look at what we’ve accomplished in 2023 and think what’s ahead of us in 2024 when it comes to WebRTC.

When we look ahead, there are several notable things that glare at us immediately:

  1. WebRTC is here to stay. But in some cases and for some use cases, the focus is shifting towards WebTransport+WebCodecs+WebAssembly
  2. The recession is here and it isn’t going anywhere, so a continuation of what we’ve seen a year ago
  3. Generative AI is getting all the love and attention out there. It is also finding its way slowly into WebRTC services

Last year, I became CPO at Spearline. This year, Spearline got acquired by Cyara and I am now Senior Director of Product Management there. I am still delving into WebRTC and CPaaS. Still consulting a bit here and there on these subjects when it makes sense.

If you are interested, you can read my last year’s WebRTC predictions for 2023 😀

Let’s get started here…

The video version

This year, I took the liberty of also sharing my predictions in a video form. It holds the essence of my WebRTC predictions for 2024, in a short form.

Read on below to get into the details.

The era of differentiation in WebRTC

We are well into the era of differentiation:

I’ve had this slide done somewhere in 2020, modifying it a bit to fit the pandemic.

It is as relevant today as it was last year:

  • We started off with WebRTC in an exploratory fashion, asking ourselves should we even use this technology?
  • Then we saw a growth spurt, where it was obvious WebRTC is here to stay. The question changed to how do we use it
  • That got us right into the age of differentiation, where services from different companies look so alike, using the same WebRTC interface and capabilities, that we now ask ourselves how do we compete

The answers of how we compete varies on a yearly basis. Now, it obviously revolves around generative AI and LLMs. That’s the easy answer. The truth is a lot more complicated and nuanced. It requires understanding where investments are currently made – both at Google and in the ecosystem around WebRTC and its use.

What does WebRTC use look like?

Last year I predicted usage would be 3 times higher than pre-pandemic. That meant lowering the use at the beginning of 2023 from 4 times to 3 times pre-pandemic. The end result? We stayed at around 4 times pre-pandemic usage.

From here, it can only go up, though slowly and linearly but likely after 2024:

  • New use cases are unlikely to cause people to start doing more video calls
  • Growth ahead will come from shifting on premise solutions to cloud ones and at the same time, migrating to WebRTC use

WebRTC, open source and XaaS

I am not going to touch the topic of open source here. I’ve done that in my article two weeks ago writing about the top WebRTC open source media servers on github.

XaaS requires a few words of explanation, and I am likely to cover them in the coming months in further detail in a separate article.

For me, XaaS is IaaS, CPaaS and SaaS. In all cases, it is a matter of looking at them from the prism of WebRTC APIs 👉 CPaaS.

CPaaS

The landscape is changing in the CPaaS domain. A few years back, the leading vendors for WebRTC APIs were Vonage, Twilio and Agora. Probably in this order.

Here’s what I had to say in my last year predictions article:

The perceived leaders in WebRTC CPaaS are still Twilio, Vonage and Agora. I have a feeling that by the end of 2023 this will change.

Little did I know this would be spot on…

Twilio just announced in December that it is exiting the video business altogether. They still have and use WebRTC for their voice capabilities, mainly with a focus on call centers. But other than that? They just became irrelevant to many developers.

Most vendors are now likely to want to compare themselves now to Vonage and Amazon Chime SDK. Agora probably as well.

From a perspective of innovation or specific market niches, other vendors come to mind as solid alternatives here. Companies such as Daily and Dolby for example (there are others – sorry for not mentioning everyone). Or LiveKit with its open source alternative.

Notables?

  • Twilio all but left the market a year ago, shifting focus to voice and text contact centers and CDPs. In December 2024 they announced sunsetting Twilio Programmable Video service
  • Vonage has been working on integrating machine learning pipelines into their SDKs, which is great
  • Dolby doubled down on low latency streaming and high end audio requirements
  • Daily leads in lowcode efforts and has been putting a lot of attention in the past year towards AI and partnerships
  • Agora has just released a signaling SDK and introduced VP9 support

That change at Twilio places more strain on developers who need to choose who to use, with the added new risk of the level of commitment they see in the CPaaS vendor they choose. When someone like Twilio throws you under the bus, what can you expect from other vendors?

SaaS

SaaS vendors are vying towards CPaaS, assuming for some unknown reason that there’s money to be had from developers.

There are a few that are taking this route.

The problem that I see here is the fact that Twilio decided this isn’t interesting enough. While they have the APIs – they don’t invest in it any further. Meaning it isn’t a big enough market for Twilio. In such an atmosphere, how would it be big enough for SaaS vendors, and how will they see the explosion in use of their infrastructure that they likely haven’t seen in SaaS.

Some of them may yet succeed, but the path here isn’t an obvious or a simple one.

IaaS

Amazon, Microsoft, Google… and… Cloudflare.

  • Amazon has AWS Chime SDK
  • Microsoft has Azure Communication Services
  • Google has… nothing
  • Cloudflare introduced WebRTC services throughout 2023

Let’s see where that takes us

Amazon is investing in Chime SDK. Especially when it comes to audio quality and capabilities. In many ways, Amazon is shifting the attention of developers from CPaaS to their Chime SDK as a solid alternative. This is a trend that should be watched by CPaaS vendors and developers alike.

Microsoft seems content with their current offering of Azure Communication Services. There were no new or interesting announcements around it in 2023, which begs the question – is it important enough for Microsoft and a viable solution for developers?

Google announced APIs for Google Meet. Ones that integrate with it, but not ones that use its infrastructure for me to build my own video experiences. So no luck there for a CPaaS play. Time will tell if this changes. It is unlikely to happen in 2024.

Cloudflare entered the market with much fanfare. I covered them in 2023’s predictions. Since then, there have been no material announcements. Is that good? Bad? I just don’t know.

How did I do with my 2023 WebRTC predictions?

I spent quite a lot of time on my predictions in 2023. Let’s see how well I did.

#1 – libWebRTC (and the future of WebRTC)

I’ve made the prediction that Google’s WebRTC library will focus on house cleaning, optimizing and polishing collaboration. It did all that this year. We see this on an ongoing basis in our WebRTC Insights service.

What was interesting to note, is a slight shift towards requirements coming outside of Google Meet. There’s work being done to include H.265 support in libWebRTC, wherever H.265 is available in a hardware implementation form (i.e – someone is already paying the patent royalties bill).

Is that because Google was benevolent and nice? Is it because they wanted to show they aren’t a monopoly in Chrome? Is it because of some other deal with Intel (the ones pushing H.265 into WebRTC). Or is it simply because they might end up using it in Google Meet in all-Apple devices meetings? Time will tell.

#2 – Machine learning and media processing

I assumed that WebAssembly would continue to be used with WebRTC for media processing in things like background replacement, noise suppression and proprietary codecs implementations.

It was.

Some of it was done in WebAssembly and browser level. A lot of it was relegated to the cloud or kept in native applications. What I found interesting, that some vendors chose to announce and release such solutions across all platforms and not start from native and move towards the web later.

Most interesting (and obvious) change here? A lot of this use is now being remarketed as generative AI – doesn’t matter if it is generative or not.

#3 – Voice before video (Lyra first, AV1 later)

I thought Lyra (=new voice codec) would find its way to applications faster than AV1 (=new video codec). Or at least new voice codecs…

The results are… inconclusive.

Webex did come out with a new Webex AI audio codec, with little explanation about it.

AV1 is starting to make real noises of almost-maturity, with Apple supporting AV1 hardware acceleration (for decoding only at the moment) and Google fiddling around with AV1 in Google Meet.

We didn’t hear much this year about Google’s Lyra or Microsoft’s Satin codecs. Just this new announcement of the new Webex AI codec. So I am not sure if voice happened before video or not.

#4 – Observability

Yes. There is more interest in observability. I know that by looking at our numbers in testRTC. There is no specific market or industry where it happens more. What I can say is that many contact centers are starting to take note. Probably due to their increased reliance in WebRTC and the fact that many contact center agents are working from home now.

#5 – M&As and shutdowns

We had a few interesting shutdowns and M&As. The most notable ones?

A lot of WebRTC engineers found themselves a new home. Either because their startups shut down, their company downsized or they saw no future where they were.

Good talent is there to be had if you look hard enough.

WebRTC predictions for 2024

Enough about 2023. That’s old news. Lets see what’s going to happen with WebRTC in 2024 😎

#1 – libWebRTC (and the future of WebRTC)

I’ll start with the most important piece of our technology puzzle – libWebRTC, maintained by Google.

This year will be a continuation of last year. Mostly maintenance releases, with a few minor improvements. The places where we will see the most amount of focus by Google in libWebRTC:

  1. Access to media frames, raw and encoded, via Insertable Streams. This will include optimizations and a bit more flexibility. The purpose of it all is to promote and push forward AI capabilities
  2. Collaboration. A continuation of last year. Some of it via Insertable Streams. Others through polishing of media control APIs in the browser to enhance the user experience
  3. Accommodating AV1. I believe by the end of 2024, we will finally see Google Meet using AV1 – we’ve just seen a glimpse of that. In some limited scenarios, on select device types. There’s also work being done to allow for VP9 simulcast with hardware acceleration instead of using VP9 SVC
  4. Voice AI. Google will put Lyra or similar into Google Meet itself. Either as a standalone or by somehow plugging it into Opus or similar. Maybe it will do so via Insertable Streams, but I doubt this will be the route they will take here

By the end of 2024, we will find ourselves similar to where we are at the beginning of it:

  • Google will be the main and virtually sole contributor to libWebRTC. The total commit numbers have been dwindling and this will continue. Will we see it stabilize in 2024?
  • Here and there, external contributions will happen. Most of them are likely to come with Philipp Hancke. But here as well, we’ve probably seen the peak of individual contributions already…

#2 – Machine learning and media processing

WebAssembly is where we see innovation and differentiation in WebRTC. 2024 will be no different.

It will be incorporated in the “same old places” of media processing.

What we will see is also a lot more machine learning on the server side, and a lot of it will be leaning towards generative AI and LLM technologies. This isn’t really a prediction, but just stating the obvious here. For someone who uses Midjourney for many of his recent articles for imagery, that shouldn’t seem as a surprise to you.

#3 – The year of Lyra and AV1

Time to take a huge risk.

I mentioned this in the libWebRTC prediction, but it deserves a section of its own as well.

Each year I say AV1 is years away. I think it is still going to take time until it becomes commonplace. That said, I believe this year we will see AV1 in one or more commercial WebRTC services, including Google Meet. It will be used judiciously and in very specific use cases and scenarios – call this testing the water.

On the audio side, we will see an AI audio codec being used in production in web browsers. Likely from Google. I believe Lyra will find its way into Google Meet. How exactly is where I am uncertain.

#4 – WebTransport as a real alternative

WebTransport started life somewhere in 2020. We’re now at the beginning of 2024.

It still isn’t available in all browsers – Safari is still missing support for it. It is available elsewhere, but far from being commonly used or in the mainstream’s mindset.

We’ve seen this year a few more experiments and proof of concepts with WebTransport that incorporate low latency media delivery. Mostly in the domain of streaming. There are reasons for that. I’ve written about that when discussing WHIP and WHEP.

Here’s what I think is going to happen: in 2024, we will see the first production ready low latency streaming solution that makes use of WebTransport instead of WebRTC or other technologies. This will be for one-way large scale broadcast use cases, where 1-2 seconds of latency are fine.

There will be those that will use WebTransport for bidirectional media delivery, similar to what Zoom is doing in web browsers, though that will stay the exception of the rule and more of an experimentation.

#5 – M&As and shutdowns

This was easy in 2023 and will remain easy in 2024.

The recession is here. It is likely to stay throughout 2024, with no real end in sight. At least not yet.

More vendors relying on WebRTC will shut down. Small startups will run out of steam. Large vendors may decide to exit this market and focus on other avenues where they conduct business.

Shutting down may mean getting acqui-hired, or acquired for peanuts. It might also mean selling chunks of the business to another company.

Vendors who stick to this market are likely to slow down their efforts throughout the year in an attempt to survive and weather this ongoing storm.

2024, here we come

Lots to do in 2024, but with limited resources:

  • Slowdown at the same time we see technology shifts and the need to differentiate
  • Generative AI, and AI in general and trying to figure out where it fits in WebRTC use cases
  • Polishing collaboration and sharing capabilities in WebRTC and getting that implemented in apps
  • Introducing next generation audio and video codecs
  • Researching new transport technologies

All that while trying to satiate users and customers with new features and releases.

The post My WebRTC predictions for 2024 appeared first on BlogGeek.me.

]]>
https://bloggeek.me/webrtc-predictions-2024/feed/ 0
Twilio exits video APIs, further focusing on voice, SMS and Segment https://bloggeek.me/twilio-programmable-video-sunset/ https://bloggeek.me/twilio-programmable-video-sunset/#respond Wed, 06 Dec 2023 07:35:15 +0000 https://bloggeek.me/?p=74126 Twilio Programmable Video is no more. What should WebRTC Video API vendors and their customers do from here on?

The post Twilio exits video APIs, further focusing on voice, SMS and Segment appeared first on BlogGeek.me.

]]>
Twilio Programmable Video is no more. What should WebRTC Video API vendors and their customers do from here on?

This week, Twilio dropped a bombshell 🤯

It decided to shut down its Programmable Video service and do a bit of downsizing and trimming around Segment and Flex.

I didn’t intend to write anything more until 2024, but this necessitated changing my plans.

💡 The image above is an adaptation from a blog post on Twilio’s website from 2021…

Twilio Signal, and why I stopped covering it

Each year, Twilio hosts its Twilio Signal event. I’ve attended a couple of them in person and used to cover them here on a yearly basis.

That stopped with Twilio Signal 2021, which was the last time I covered that event here. The reason for that was the pivot Twilio made from CPaaS to CEP (Customer Engagement Platform).

Ever since, I’ve searched for things to talk about and share about Twilio Signal, but found nothing of real value or interest to my readers.

Remember – I cover WebRTC and CPaaS. CPaaS mainly from the point of view of WebRTC and modern communications and less from the SMS and legacy telephony sides of it.

The shift towards CEP meant a lot less investment and focus by Twilio on exactly these areas – WebRTC and CPaaS that are non-SMS/legacy telephony related.

What did Twilio have to show for its investment in video and WebRTC in 2022 and 2023? Nothing. Crickets. Oh… yes… they did integrate with Krisp for noise cancellation. Presumably only in their Video SDK and not the Voice SDK. So that’s down the drain as well.

The decision might be the right one for Twilio, if you look at where their investments and attention are going:

  • Twilio Flex, for a programmable contact center
  • Segment, as a leading CDP vendor
  • Fuzing Segment with programmable communications

Video is likely 1% or less of their revenue. So why bother? Especially when it requires management attention to get it anywhere meaningful with so much else that is bigger and more important to deal with.

CPaaS vendors: Best of breed vs best of suite

I learned about the concepts of best of breed and best of suite when working at Amdocs.

  • A best of breed vendor would specialize vertically, offering its customers a solution that is great in a narrow domain. Think of it as “the leading SMS vendor”. You do SMS and only SMS and you do it really well
  • Best of suite is all about the breadth of your offering. You provide a solution that has a mixture of multiple services and features your customers will need. You might not be doing any of them the best in the market, but if someone needs multiple services and wants a single vendor to work with – you’re the best for them. Think of it as offering SMS, voice, email, video, … – Twilio

Twilio started with SMS and voice. It later decided to expand and become “best of suite” by attaching to it email, video, IOT, social messaging, chat , …

What happened though is that in parallel, it worked hard on being best of breed in voice and SMS. Doing that by going upstream and introducing Flex. Flex reduced the effort of contact centers built on top of Twilio.

And then they pivoted. With the acquisition of Segment and the need to tightly integrate it with their CPaaS and Flex offering. Transitioning from taking care of communications to taking care of understanding the customer.

Today?

There are two types of CPaaS vendors:

  1. The best of suite ones, who offer the breadth of communication services
  2. Or the best of breed ones, who focus on a specific domain. And the domain I care about is WebRTC and video. These usually won’t have legacy telephony. At most, they will enable connecting to legacy telephony of third parties

Interestingly, both are circling like vultures around Twilio to see which customers are going to come out of there looking for alternatives. Some of these CPaaS vultures offer pure WebRTC video solutions. Others offer the whole suite. And there are those who don’t even offer video – but see this as an opportunity to poach customers from Twilio.

The cases of Twilio IOT and Twilio Live

I remember that in one of the first Twilio Signal events, Jeff Lawson stood on stage and proudly announced that they never deprecated an official API. The way this was later handled is by having beta and GA phases for products.

This cannot be said anymore… by the end of 2022, Twilio started sunsetting and shutting down services.

It started with a round of layoffs at Twilio. Jeff Lawson, Twilio’s CEO, wrote a message that got to the Twilio blog as well. Here’s what we shared about it at the time with our WebRTC Insights clients:

  • Twilio laid off 11% of their workforce
  • The decision was to take the internal email and publicly put that on their blog, instead of getting it indirectly on TechCrunch
  • A few interesting to note in this email:
    • Twilio has 4 focus areas: reliability+trust, profitability of messaging, Segment adoption, Flex customer base
    • 3 main products in focus: messaging, Segment (Customer Data Platform), Flex (Programmable Contact Center)
    • Programmable Video isn’t prioritized at all. Programmable Voice might be said to be buried somewhere in there under Flex
    • Twilio’s future success and growth lies Segment and Flex – not in Communication APIs
  • The charts below show the number of employees and growth rate of Twilio in recent years
  • Why is Twilio doing this? A few options here
    • Growth is slowing, and all the hiring they did is just too much to maintain
    • Management has too many directions it is now looking at, so it was time to shoot down all the smaller initiatives and products since they won’t bring the necessary growth at Twilio’s size
    • Twilio might have used the current market state to clean the stables and remove all the useless fat from the company
    • All of the above, to some extent
  • How will this affect other CPaaS vendors? This is hard to say. Here are a few thoughts
    • If Twilio is in poor shape, then the rest are in worst one
    • With Twilio management shifting focus elsewhere, the API space, and especially in voice and video, it is down on these areas to build some differentiation
    • Time to use FUD in the market against using Twilio for video APIs – Jeff just said it isn’t a focus area. Just make sure it doesn’t backfire…
    • Maybe CPaaS isn’t as great as it was believed to be as a business…
      • From my past life I know that selling to developers is super hard
      • And the target market for it is rather limited
      • There are better opportunities out there, which is why many CPaaS vendors are following in Twilio’s steps when it comes to Flex
  • Also, if you are looking for developers, it might be worthwhile to try and poach a few of those who still work at Twilio, or more easily those who are looking for a new job

After the reduction in workforce, came the reduction in product offerings. The first two to go through the chopping block were Twilio IOT and Twilio Live.

Twilio Live was announced dead in November 2022. Low traction of the service and little fit the the direction of Twilio meant this had to die. The way this was done? Let customers know. Officially suggest they go use Mux instead. Somehow, the fact that Mux at the time had a service competing directly with Twilio Programmable Video wasn’t something that worried Twilio.

Twilio IOT was simply sold off to KORE Wireless in March 2023.

Remember that suggestion we gave about FUD in the market against using Twilio for video APIs? (I marked it in yellow above so you won’t miss it)

The demise of Twilio Programmable Video

Here’s what the Twilio product menu looks like on their homepage:

This is likely going to change soon or by the time this gets published.

  • Customer Data = Segment offering
  • Communications = CPaaS
  • Applications = Enterprise stuff

Each and every piece in the Communications part can be snuggly fit into the products on the left and on the right (Customer Data and Applications).

Video is a bit of a stretch. At least if you look closely at traffic sizes and revenue numbers.

The two other oddballs – IOT and video streaming – were thrown out without too many objections and without hurting Twilio’ bottom line.

What was left was to get rid of the video piece. It likely took too many resources but made no real dent in Twilio’s numbers.

To be frank – the problems likely started with the acquisition of Kurento. Kurento wasn’t fit for what they had in mind for it, and it was riddled with architectural and technical issues. This wasn’t a good starting point for multiparty calling in Twilio Programmable Video.

If I had to guess, a lot of technical debt went into the product to improve and repurpose the media server pieces of Kurento.

Twilio was slow to innovate on video, leaving the room for other vendors – big and small. It missed the lowcode and embeddable experiences that are now common in video APIs. They didn’t invest in AI integrations too much. It didn’t optimize media quality enough to work well for its customers.

And then it left the door open for Amazon with their Chime SDK to threaten them in this domain.

I am guessing growth and revenue from Twilio Programmable Video wasn’t in line of expectations (unsurprisingly). The current market climate, the end of the pandemic, the headaches in Segment and Flex. All of it got them to the conclusion that it would be simpler to just sunset Twilio Programmable Video and move on.

A brave decision. Twilio Programmable Video couldn’t have been sunset in the worst time (unless you consider a few months prior to the pandemic and the quarantines).

A week before this announcement from Twilio, Amazon announced support for video calling in Amazon Connect.

Amazon is investing in adding video to its contact center solution, and Twilio, who has Twilio Flex competing against Amazon Connect, is sunsetting video support for its video API.

  • What does it mean for video calling support in Twilio Flex?
  • Would Twilio still support or add video calling to Twilio Flex without offering Programmable Video APIs?
  • How should contact center customers view this? If they have video requirements in their roadmap, would they use Amazon Connect or Twilio Flex?

Innovations in Video APIs and WebRTC managed services

Why was Twilio Programmable Video appealing to potential customers? I can think of two main reasons:

  1. Single throat to choke. Sourcing your voice, SMS and video from the same vendor, on a single bill is an advantage
  2. A reputable vendor. It is Twilio. They are big. What can ever go wrong? …

The reasons why not to? Quite a few:

  1. Quality wasn’t on par with what can be achieved elsewhere with CPaaS vendors
  2. No lowcode/embeddable offering for its video API
  3. Support… could be better
  4. No innovation

All that Twilio had for itself is its brand name. And that in a market that was moving on.

Things other vendors have been doing in that period of time?

  • Doubling down on large scale sessions, with 10,000 or more users
  • Live streaming solutions (the one Twilio sunset in 2022 – Twilio Live)
  • Investing in AI integrations and pipelines, both on client side and on server side
  • 3D audio, VP9 video codec support
  • Nocode/lowcode solutions

Twilio wasn’t able to keep up. Or even pick a direction it wanted to invest in.

The rise of the Zoom Video SDK

Twilio issued an email to its customers on December 5, stating the sunset will take a full year. From this email:

[…] we have decided to End of Life (EOL) our Programmable Video product on December 5, 2024, and we are recommending our customers migrate to the Zoom Video SDK for your video needs. 

The official recommendation from Twilio is for their customers to migrate to the Zoom Video SDK.

The announcement can’t be found (yet) on any marketing material from Twilio. It can be found on social media accounts from Zoom.

Why Zoom?

  1. Zoom isn’t a competitor of Twilio in anything, and are unlikely to be any time soon
  2. It is a large and respectable vendor with a brand name

They couldn’t suggest vendors that have SMS or voice services.

The rest are mostly smaller vendors – not something Twilio wanted to be identified with is my guess.

There’s only one problem with picking Zoom Video SDK here. Their web experience isn’t on par with the rest of the pack. They rely on WebTransport+WebCodecs+WebAssembly, which isn’t as stable or performant as just using WebRTC. For native, their SDKs should be fine, but for web browsers, I’d be reluctant to use them yet. Add to that the fact that this is a technology shift, requiring some relearning of terms and a reliance on proprietary technology, and you get some increased risk for the vendors switching.

I wonder if Twilio and Zoom came to an agreement here (with Zoom maybe even paying for this suggestion to go out) or if Twilio simply decided to offer some kind of a recommendation and be done with it. Philipp’s bet: Eric Yuan had dinner with Jeff Lawson and paid for it.

Anyhow, customers have a full year to figure out a solution. Or less – depending on how much browsers WebRTC implementations drift away from the current implementation of Twilio. What doesn’t get maintained in WebRTC rots rather quickly.

The future of managed Video APIs (without Twilio)

I am not sure how much Twilio Programmable Video would be missed.

Developers certainly used it. Big and small. Its revenue was probably higher than some of the smaller video API vendors out there. These developers will figure out a way to migrate to other vendors to use. It won’t be the first time a CPaaS vendor has existed in the video API market (we had AddLive, vLine, ooVoo, SightCall, Respoke, Tropo, Forge, CafeX, Circuit, Bit6 all exit this market in the past).

3-4 years ago, we had 3 top dogs in this market: Vonage, Twilio, Agora

A year ago, I’d say I heard a lot more about Vonage, Amazon Chime SDK and Twilio. Less so Agora

Now, we have Vonage and Amazon Chime SDK

Who will take the 3rd spot in the 3 runners when it comes to developers’ mindshare in this industry?

We have Agora, Daily, Dolby, LiveKit and others who are all vying for that spot. Each has its own angle and differentiation.

Would Vonage keep its spot there?

Will Amazon continue investing in its Chime SDK enough?

I don’t have the answers to these questions, but I do have my own opinions.

Where should Twilio Video customers go from here?

That is the big question.

If you are using Twilio Programmable Video – who should you go to instead?

And if you are on the lookout for a CPaaS vendor now – who should you pick?

My WebRTC Developer Landscape infographic was last updated in 2022, but can still offer some guidance as to the alternatives available. Some of them I’ve listed throughout this article. Others are just as valid.

Here are a few questions you need to answer for yourself:

  • What are your requirements and focus? Different CPaaS vendors offer a different type of a solution, so pick one that offers what it is you’re after
  • Make sure you ask around. Check references. Talk with other developers who use that CPaaS vendor
  • Try them out in a small POC before fully committing yourself
  • Check their commitment and level of investment in what it is you focus on as your requirements and roadmap. Don’t only listen to what they say – also check out what features they introduced to the market in the last 12-24 months. See if they had layoffs in that same period of time as well, and make an educated guess if they will be around a year from now. Maybe wait six months until making the decision
  • Don’t invest in abstraction layers to be able to replace CPaaS vendors. It sounds like a great initiative and project, so just don’t do it. Unless you want to use more than a single vendor at a time (unlikely for most of us)
  • While you shouldn’t invest in an abstraction layer, you should definitely try to limit calls to the CPaaS vendor’s APIs to specific modules in your code. If you can limit it to a single source file or class – even better

Make sure also to read my CPaaS vendor lockin article before making any decision here…

The post Twilio exits video APIs, further focusing on voice, SMS and Segment appeared first on BlogGeek.me.

]]>
https://bloggeek.me/twilio-programmable-video-sunset/feed/ 0
Zooming in on remote education and WebRTC https://bloggeek.me/remote-education-webrtc/ https://bloggeek.me/remote-education-webrtc/#respond Mon, 06 Nov 2023 10:30:00 +0000 https://bloggeek.me/?p=74041 An overview of remote education and WebRTC. The market niches, challenges and solutions.

The post Zooming in on remote education and WebRTC appeared first on BlogGeek.me.

]]>
An overview of remote education and WebRTC. The market niches, challenges and solutions.

Whenever a video meetings company starts looking at verticals for the purpose of targeted marketing, one of the verticals that is always there is education. We’ve seen this during the pandemic – as the world went into quarantine mode, schools started figuring out how to teach kids remotely.

The remote education market is not just schools doing remote video calls. It is a lot more varied. I’d like to explore that market in this article.

How big can remote education really get?

There are around 2 billion children in the world. Over 80% of them attend schools.

Some 235 million higher education students are out there as well around the globe.

During the pandemic, a lot of them were online, taking classes remotely. For multiple hours each day.

The slide above is from Kranky Geek 2020. In this session, Google talked about their work on WebRTC in Chrome.

Here they shared the increase in video minutes during the initial quarantines. The huge spike there starts at around the August/September timeframe, when schools start.

Remote education is here to stay. Not with its increased usage of 10-100x, but definitely bigger than in the past. There are many places where remote education can fit – and not only for emergencies such as the pandemic.

Me? Remote education?

Like everyone else, my kids went through the process of remote education during the pandemic. Here, the Ministry of Education went all-in with Zoom for schools (along with Google Classroom and Microsoft Office – go figure). Since then, our kids have on and off private tutors doing classes remotely sometimes. And now, when we have a war raging between Gaza and Israel, depending on where you live, you might be studying from home or physically in school.

I had my share of consulting with education organizations across the globe. Some focusing on schools, others with universities and some with private tutoring. It was always fascinating to see how such markets are distinctly different from each other, and how remote education also takes different shapes and sizes based on the country.

And then there are my own online courses, with their associated office hours and AMAs.

The role of WebRTC in remote education

WebRTC plays an important role in the education market. Besides offering video communications, it also enables the ability to mesh the communication experience directly into the LMS (Learning Management System) or the SIS (School Information System), offering a seamless and tailored experience for both the teacher and the learners – one that enables the educators to implement various pedagogies.

☝️ Remember here that WebRTC is a synchronous technology – live, real-time voice and video communications. A large chunk of the education market is leaning heavily on asynchronous learning (recorded videos, texts to read, etc). These are not covered in this article.

Here are some market niches and use cases where you will find WebRTC in remote education.

Group lessons

The simplest one to explain is probably group lessons. The classic one would be the pandemic use case, where during quarantine, schools went all virtual – classes were conducted online.

Remote group lessons aren’t limited to schools either – they are done in universities, private group tutoring, etc.

Main challenges here include:

🔶 Moderation tools for the teachers. Ones that are simple to use while conducting the lesson itself

🔶 Collaboration tools to make the lessons more engaging. Maintaining engagement in online group lessons is the biggest challenge at the moment, especially for younger learners

🔶 Authentication and authorization of users. Lots of anecdotal stories around this one throughout the pandemic

One thing that is raised time and again with group lessons, especially in schools, is the need (and inability) to get the students to keep their cameras on. This is a huge obstacle to effective learning, and something that needs to be taken into account.

Another important thing that needs to be fleshed out early on here, is who is the client – is it the teacher or the students. Whoever the system is geared towards will set the tone to how the solution gets designed and implemented.

One-to-one tutoring

These are mainly one on one lessons conducted remotely.

Outside of the domain of classic education, a lot of classes are actually conducted in such a way. Here are a few anecdotal stories from recent years that I’ve learned about:

🔶 A dear friend who is learning to play the piano. Remotely. She travels a lot between the US and Israel, and takes her lessons from everywhere through her iPad

🔶 Another friend, taking 1:1 drawing lessons

🔶 Online chess lessons for kids in our community

🔶 My son’s friend, learning C++ on Unreal engine, taking 1:1 lessons

🔶 My son, a few years ago, when he was 10 or so, learning to build online games using nocode game engines from an 18 year-old who lived two cities away

🔶 My wife took online dance lessons to specialize in Salsa from a renowned instructor abroad

Besides the collaborative, engagement level and nature of such lessons, it is important to note that they aren’t suitable for everyone. Some teachers are more natural in these, and some students can learn effectively in such a manner while others struggle (I have both examples at home).

An interesting use case here that I’ve seen is math and English (!) tutors from India and China teaching remote kids in the UK and the US. Why? Simply because they are cheaper than using local teachers. Then there was the opposite – rich Chinese families getting one-to-one English tutoring for their kids from US teachers. Go figure.

One-to-one tutoring comes in a lot of different shapes and sizes.

MOOCs (Massive Open Online Courses)

MOOCs were all the rage 10 years ago. Their market is still consistently growing.

MOOCs are simply large online courses that are open for people around the globe. Some of them are collaborative, while others are mainly lecturer driven. Some allow for asynchronous learning while others are more synchronous in their nature. Both the asynchronous and synchronous learning modes in MOOCs offer self-paced learning (at least to some degree).

WebRTC finds its way into MOOCs for their synchronous part, when that requires live video sessions – either between lecturers and students or between student groups in the more collaborative courses.

Proctoring

Proctoring isn’t about learning, but about taking exams. Remote proctoring enables taking exams at the comfort of one’s home or office without going to the classroom.

With proctoring, the user is required to open up his camera and microphone as well as share his screen while taking the exam. The proctoring application takes care of checking that other tabs aren’t being opened and that nothing fishy is taking place (as much as possible). WebRTC is used to gather all that realtime audio and video data and record it. If needed, these recordings can be accessed by human proctors later on.

It should be noted that for proctoring, there are a lot of requirements around circumventing the ability to cheat on the exam. This includes things like monitoring applications used during the exam, maintaining focus on the exam page, etc. To achieve this, most proctoring solutions end up as PC applications (usually using Electron) which the student needs to install on his machine in order to take the exam. The innards of the proctoring application will end up using WebRTC in a web application – simply for its speed of development and the use of the WebRTC ecosystem.

Coaching

While similar to classic education, coaching is slightly different. In its essence, these can be 1:1 sessions or small group sessions where issues and challenges in certain areas get fleshed out. In group lessons and 1:1 tutoring, a lot of the focus is on collaboration features. Here, in many cases, it will be more on the video of the participants and the need to bring them together.

Another interesting aspect of coaching is the platform it gets attached to – either directly or indirectly. Coaching often comes bundled as a larger course/training offering, mixed with in-person meetings, reading/presented materials and the coaching sessions themselves.

The LMS and SIS systems are usually also lacking in the coaching platforms. Usually, these will be geared towards flexible use and at times an integrated payment system.

Webinars

Webinars are a form of lessons that is conducted over the internet, mostly for businesses to assist in marketing and sales efforts. Depending on the level of interactiveness of the webinar, the need and use of WebRTC will be needed.

In the past, webinars were usually conducted via specialized downloadable applications, where the content was mostly slide decks and the voice of the speakers. The interaction with the audience was done via text messages and organized Q&A. Over time, these solutions became richer and more sophisticated, adding video communications as well as the ability of the audience to “join the podium” if and when needed.

Using WebRTC here enabled getting rid of the application download requirement and increased the level of interactivity quite considerably.

The intersection of education and healthcare

Education and healthcare are bound together. I’ve shown that a bit in my WebRTC in telehealth article, looking at it from the remote training of healthcare topics perspectives. I want to take a different angle on the same topic here. I’ll do that by showcasing two interesting use cases I’ve been privy to a few years back.

#1 – Dance lessons in cancer

I heard this one from a dancer who had cancer and healed. Women with cancer have it hard. Chemo is brutal – it seeps out the energy and causes hair loss. This means women don’t want to go outside that much. Here, being able to bring them remotely to a dance lesson can be a real benefit to them, especially if they love(d) dancing. They won’t go physically – not wanting to meet people outside and the stairs that come with it – along with the energy it takes. But they will be willing to dance – maybe.

Remote dance lessons for this niche is beneficial. Not from an educational standpoint but more from a mental health one.

#2 – Video in class for students in hospitals

Another vendor I worked with briefly was assisting school kids who had to be treated in hospital or just stay home for prolonged periods of time (think weeks or months at a time). Their solution was to bring a video conferencing system and rig it in the physical classroom of the kid as well as where he is located, be it home or a hospital bed.

This way, the kid could join the classes as well as stay connected to other classmates during recesses. The main purpose here isn’t really the teaching part, but rather to make sure the student stays in contact with peers in his age group and not be secluded during that period of time.

Is this a use case in education? In healthcare? I can’t really say…

ERT (Emergency Remote Teaching)

The pandemic showed us that remote education is challenging but might be necessary. We were all quarantined for long periods of time, with school across the globe going remote.

Here in Israel, when clashes with Gaza or Hezbollah in Lebanon flare, schools shift to remote learning. It isn’t frictionless or smooth, but it is the solution we have to try and continue educating kids here.

The most crucial aspect of ERT is that teachers are forced to change their teaching setting with no preparation. In Israel, at least, the pandemic didn’t prepare teachers for the current war – it feels like the education system in Israel learned nothing from the pandemic wrt to remote teaching ☹️

Top down decisions; sometimes

Education is interesting. Especially the institutional ones of schools.

In some countries, decisions are made top down while in others, there’s more autonomy kept at the school level or the district level.

Here are a few things I learned asking the question on LinkedIn, about what tool was used during the pandemic for virtual classes across the globe:

  • Israel. Where I live. Was mostly Zoom during the pandemic
    • There was also a bit of Google Meet and some BigBlueButton, due to its integration with MASHOV (an SIS in Israel)
    • The government struck a deal in education for Google Classroom country-wide
    • There’s also Office available for free for all students
    • And Zoom was the decision for virtual classes
    • This year, it all changed to Google Meet, presumably due to security concerns, but more likely this was due to pricing (Zoom renewal cost money while you get Google Meet and Microsoft Teams for free with Google Classroom and Office respectively)
    • Zoom hurried up with a statement that it is secure and now available for free for the education system in Israel
    • As the saying goes – it’s all about the money
  • Bulgaria used Jitsi Meet (through the Shkolo platform); later replaced by Microsoft Teams. Both with government provided accounts
  • Colombia. Most public schools and the university system relied heavily on Microsoft Teams. Private schools and universities were about an even split between Zoom and Microsoft Teams
  • Austria was mainly Microsoft Teams
  • Russia – Zoom
  • The United States was mostly Zoom. It wasn’t mandated, but just how things ended up in most places
  • UK. A private school in London opted for Microsoft Teams. Public schools were left to figure out their own solution
  • Argentina. Zoom, though I am not sure if everywhere and if the decision was top down or bottom up
  • India. Primarily Microsoft Teams and occasionally Zoom. Mainly because Microsoft Teams had better and stronger channel partners in India, being able to offer better deals
  • France. Started with Zoom and Jitsi Meet in schools. Now, the government has built a large scale BigBlueButton infrastructure for virtual classrooms

This is by no means complete or accurate, but it shows a few important aspects of education:

🔶 In some countries, decisions on the tools to use is taken top down, while in others, each district or school is left to autonomously make a decision

🔶 Like in many industries, but probably more so, appearances matter. Losing Israel for Zoom was bad publicity. They had to fix that quickly by renewing the service for free. BTW – the damage is already done, my kids are now using Google Meet at school and there likely isn’t a way back

Live, online and in-person

Education is mixed. It isn’t all virtual and isn’t all in person.

My own WebRTC Courses are online, but not live. The lessons are pre-recorded. I offer monthly AMA meetings as part of them which are online and live.

I took a CPO course last year. It included in person meetings (3 full days), weekly live sessions as well as pre-recorded information.

My kids are now learning some days remote and some days in-person in the school.

Some countries had recorded/broadcasted lessons alongside virtual live classes during the pandemic, creating from them a full set of learning materials that students can use moving forward.

👉 The LMS (Learning Management System) used needs to take all these into account, enabling different learning strategies and different content types. Your own service needs to be able to figure out what works best.

Hybrid

The term Hybrid Learning refers to any form that incorporates online and offline learning. This is slightly different from how we define hybrid meetings.

  • As an example in Israel at the moment, in the current “war setup”, students go physically to school a few days a week and the rest they learn asynchronously or synchronously from home.
  • Another example of hybrid learning is when students work with laptops in the traditional classroom.

Allowing a student to join remotely to a class taking place in-person is a real challenge, but one that needs to be dealt with as well. This isn’t any different from hybrid meetings in enterprises in terms of the basic need. The difference is likely in size and complexity.

Most classes aren’t geared to this. From the placement of the cameras in the class, to the way the lessons are conducted and to the way teachers need to split their attention between in person to remote students.

👉 In most places, going hybrid in education is an intentional decision that can be made only for select use cases and in a limited number and types of institutions.

Moderation

Who is allowed to join a virtual lesson? Should the teacher approve each student joining? How do you know who is online? Who is actively listening? Should anyone be automatically allowed to speak up? Share their screen? Is there a way to check if the student goes “off the reservation”, doing other things in other browser tabs or on his phone in parallel?

All these are hard questions with no good answers.

Moderation in education must take place – especially for group lessons. This has two purposes:

  1. Maintain a semblance of order
  2. Let the teacher focus on teaching

Oftentimes, moderation tools deal with a semblance of order but less with the focus of the teacher or teaching.

The decision in Israel for example to go for Google Meet makes total sense simply because authentication and identity is managed by Google Classroom already. Classroom is acting as the LMS as well, or at least the hub for students and teachers. Having a tighter integration means some of the moderation requirements can more easily be met.

👉 It isn’t only about what can be moderated, but how and with what level of friction

Assessment

How are assessments taking place in online learning?

In the traditional classroom, teachers physically saw the students and could easily gauge their level of attentiveness. To that, home assignments and tests were added.

Once going online, technology can come to assist the teachers and students, adding a layer of information to the assessment process. Dashboards can be built to make this data accessible.

Where does WebRTC fit in here? The same way it does in online meetings, where we see today a growing focus on incorporating transcriptions, meeting summaries and action items automatically. Similar LLM/generative AI technologies can be used to glean insights out of online lessons.

In many ways, this isn’t done yet. Probably because we’re still struggling with engagement (see below).

Collaboration and whiteboarding

How is collaboration done in education? Do we need the classing blackboard/whiteboard for teaching? How does that get translated to the digital, remote scenario?

Are we looking here for something as powerful and flexible as a Miro board or something simpler and less feature rich?

Is teaching math or physics similar to teaching languages or literature when it comes to collaboration and whiteboard?

How about Kahoot or similar polling/quiz capabilities? Do we make them engaging or boring as hell?

A lot of thought and energy needs to be diverted towards these types of questions, in trying to figure out what works best to increase engagement and improve the learning experience (and by extension, the learning itself).

The challenge of engagement

How do you define engagement in online synchronous lessons?

Is students opening cameras considered engagement?

Maybe students be engaged with their cameras turned off 🤔

Getting students to open up their cameras, having them choose to do so and keep the cameras on is a big issue in schools and in higher education.

In my son’s school, they are now shifting towards enforcing students to open their cameras… but allowing them to point that camera at the ceiling 🥴

Once you have cameras on, how does a teacher gauge the level of engagement of a student? How does he spare the time looking at 20+ students (36 in Israel classes) to understand if they are engaged or not while trying to present his screen to teach something out of his slidedeck?

👉 “Feeling the crowd” to understand if a topic needs further explanation or can the teacher move on to new topics is harder to achieve online than it is in person.

The challenge of engagement (part 2)

How do you get students engaged?

What type of collaboration solution do you need?

Which experiences should be baked into the solution?

My son decided to take up Russian. His friend speaks Russian with his parents, so he decided he wants to understand when they talk to each other (go figure). He decided independently to install Duolingo on his phone and has been taking their lessons for almost a year now 😲

He can now read Russian and know quite a few words.

A good friend of mine is learning German using Duolingo. We did a roadtrip in the US in February. I had to hear him learn in our long hours on the road. It was an interesting experience to see it from the side, trying to figure out how this magic happens.

Engagement and “gamification” are a main part of how Duolingo works and how it gets students back into their app over and over again.

👉 We haven’t quite cracked the formula of how to do this well in live virtual classes. There must be a way to get there, and when we find it, we will see great dividends from it.

Asymmetry in remote education

There are teachers and there are students. Who is the system designed to cater?

A simple question. Answering with “both” is likely going to be wrong most of the time.

I had a meeting at a large and prominent university in Europe a few years back. They wanted to build a video conferencing system for lectures. Have the professor in front of a large digital board showing tens of students joining remotely. Call it extremely expensive and unique. That was before the pandemic, so unrelated to it.

The question I had was who this system is for. Is it to sell students on a great remote experience or is it for the professor to feel important. I have my own answer here 🤔

You need to decide who the service you are developing is really there to cater – the teacher and his needs, assuming that students will simply join because they have little choice. Or the students, focusing on enticing them to join, collaborate and interact.

Doing both at the same time is a real challenge, and one that most vendors aren’t prepared to take yet.

👉 Figure out who your main user is. The teacher or the students. Or maybe the parents?

Training the educators

Someone needs to teach the teachers how to use the service. This is a real problem, especially when going mainstream.

When the pandemic started and Zoom was selected here in Israel, a lot of videos surfaced explaining how to use Zoom in the context of teaching with it. Last month, when Google Meet was the official solution, you started seeing the same occur for Google Meet here in Israel.

The differences between these two services may seem minor, but they are big for teachers who aren’t technically savvy.

Some private tutors for example shy away from remote lessons. Their reason is the inability to focus on the student during the lesson. Increase that by 20-40 students in a single lesson, many of them acting like prisoners trying to break out and figuring out ways to game the system called a virtual lesson, and you get to the need for teachers who know their way using the service inside and out.

👉 Onboarding and familiarizing teachers to the platform is just as important as the actual service, sometimes even more

A matter of costs

This one might just be an opinion of mine.

Remote education is a huge market. During the pandemic, it encompassed almost all the world’s students. And yet, the amount of money available to spend per minute is quite low.

In many cases, the deals are large (in front of a state or a country). Sometimes, they are smallish, in front of a single school. There’s money in these institutions, but in many cases, that money is spent elsewhere.

When going after the education market, it is vital to understand the buying habits and budget of the would-be purchaser beforehand.

👉 Solutions in the education market need to be cost effective and efficient from a WebRTC infrastructure point of view

🔸🔹🔸🔹🔸

Where can I help, if at all?

🎯 Online WebRTC courses, to skill up engineers on this technology

🎯 Consulting, mostly around architecture decisions and technology stack selection

🎯 Testing and monitoring WebRTC systems, via my role as Senior Director at Cyara (and the co-founder of testRTC)

The post Zooming in on remote education and WebRTC appeared first on BlogGeek.me.

]]>
https://bloggeek.me/remote-education-webrtc/feed/ 0
What is WebRTC and What is it Good For? https://bloggeek.me/what-is-webrtc/ https://bloggeek.me/what-is-webrtc/#comments Wed, 01 Nov 2023 08:42:00 +0000 https://bloggeek.me/?p=11293 What is WebRTC and What is it Good For? This 7-minute video provides a quick introduction to WebRTC and demonstrates why it is growing in importance and popularity.

The post What is WebRTC and What is it Good For? appeared first on BlogGeek.me.

]]>
Use cases of WebRTC, how it works and benefits of the technology explained in a nutshell.

What is WebRTC?

WebRTC is an HTML5 specification that you can use to add real time media communications directly between browser and devices.

Simply put:

WebRTC enables for voices and video communication to work inside web pages.

And you can do that without the need of any prerequisite of plugins to be installed in the browser.

It was announced in 2011 and since then it has steadily grown in popularity and adoption.

By 2016 there has been an estimate from 2 billion browsers installed that are enabled to work with WebRTC. From traffic perspective, it has seen an estimate of over a billion minutes and 500 terabytes of data transmitted every week from browser communications alone.

WebRTC has increased in popularity and use throughout the COVID-19 pandemic. Quarantines and work from home made remote communications a necessity, indoctrinating billions of users about the use of video calling. The end result has been a surge in the use of WebRTC:

The growth in use of WebRTC during the COVID-19 pandemic

In 2021 WebRTC got officially standardized, removing all doubts about its future prospects. Today, WebRTC is widely popular for video calling but it is capable of so much more.

A few things worth mentioning:

  • WebRTC is completely free
  • It comes as open source project that has been embedded in browsers but you can take and adopt it for your own needs
  • This in turn has created a vibrant and dynamic ecosystem around WebRTC of a variety of open source projects and frameworks as well as commercial offerings from companies that help you to build your products
  • WebRTC constantly evolving and improving, so you need to keep an eye on it (e.g. see hiring WebRTC developers)
  • See also: The state of WebRTC open source projects

Covered in this video:

  • What is WebRTC?
  • Current state of adoption
  • Why is it so much more than just a video chat enabler
  • The power of “Open Source”
  • How WebRTC works
  • Five reasons to choose it

(this article was updated in December 2023)

WebRTC’s meaning

WebRTC stands for Web Real-Time Communications.

Web is simple – it means that what we are doing works “over the web” and inside a browser. The browser part means that all modern browsers support WebRTC. If you run this inside a native application I will still be considering it as WebRTC. To me, it is the thought that counts, or more accurately, the implementation of WebRTC (or parts of it) are quite popular as a starting point in native applications. This is due to the quality of the WebRTC media engine (as implemented by Google) AND due to the fact that it makes it easier to communicate across native applications and web applications this way.

RTC, or Real-Time Communications means that whatever WebRTC does – it does in real time. Its focus is on sending the data it has as fast as possible – making sure to use low latency techniques to get things done. Whenever possible.

If we’re moving away a bit from the word-meaning of WebRTC, then this is the definition I usually use to define WebRTC:

Lets break it down a bit:

  • WebRTC offers real time communication natively from a web browser
    • WebRTC is part of the web browser. Every modern web browser today implements WebRTC
    • It offers the ability to create real time communication applications and experiences
  • WebRTC is a media engine with JavaScript APIs
    • WebRTC is a media engine. There were other media engines before WebRTC and there will probably be others after it. In that sense, there’s no “innovation” here
    • That said, it is standardized by an API layer defined in JavaScript. This contributes to the ecosystem that has been created around WebRTC

So, how does WebRTC work?

Code and API

It is important to understand from where we are coming from: If you wanted to build anything that allowed for voice or video calling a few years ago, you were most probably used C/C++ for that. This means long development cycles and higher development costs.

WebRTC changes all that: it takes the need for C/C++ and replace it with a Javascript API.

It comes with a Javascript API layer on the top that you can use inside the browser. This makes it far easier to develop and integrate real time communications anywhere. Internally, WebRTC is still mostly implemented using C/C++, but most developers that use WebRTC won’t need to dig deep into these layers in order to develop their applications.

Availability

WebRTC today is available in all modern browsers. Google Chrome, Mozilla Firefox, Apple Safari and Microsoft Edge support it.

You can also “take” it and integrate it into an application or an embedded device without the need of browser at all.

Browsers and operating system support for WebRTC

Media and access

What WebRTC does is allow the access to devices. You can access the microphone of your device, the camera that you have on your phone or laptop – or it can be a screen itself. You can capture the display of the user and then have that screen shared or recorded remotely.

Whatever it does is in real time, enabling live interactions.

WebRTC isn’t limited to voice and video. It allows sending any type of arbitrary data.

There are several reasons WebRTC is a great choice for real time communications

  1. First of all, WebRTC is an open source project
    • It is completely free for commercial or private use, so why not use it?
    • Since it is constantly evolving and improving, you are banking on a technology that would service you for years to come
    • WebRTC is a pretty solid choice – It already created a vibrant ecosystem around it of different vendors and companies that can assist you with your application
  2. It is available in all modern browsers
    • This has enabled and empowered the creation of new use cases and business models
    • From taking a Guitar or a Yoga lesson – to cloud gaming and social networking – to medical clowns or group therapy – to hosting large scale professional Webinars and live broadcasts; WebRTC is capable of serving all of them and more
  3. WebRTC is not limited to only browsers because it is also available for mobile applications
    • The source code is portable and has been used already in a lot of mobile apps
    • SDKs are available for both mobile and embedded environments so you can use WebRTC to run anywhere
  4. WebRTC is not only about for voice or video calling
    • It is quite powerful and versatile
    • You can use it to build a group calling service, add recording to it or use it only for data delivery
    • It is up to you to decide what to do with WebRTC
  5. WebRTC takes the notion of a communication service and downgrades it into a feature inside a different type of service. So now you can take it and simply add communication in business processes you need within your application or business

What is WebRTC used for?

You can group WebRTC applications into 4 broad categories:

  1. Conversational voice and video – the obvious one. Applications that need the ability to have a person communicate with others in real time and in a conversational manner. These will more often than not end up using WebRTC
  2. Live streaming – while WebRTC isn’t the most popular choice for streaming, it is one of the best technologies available for low-latency live streaming. If you need to stream something to one or more users and maintain really low latency to enhance the interactivity (things like cloud gaming, gambling, auctions, webinars, etc) – then WebRTC might be a great choice
  3. Data transfer – you can send voice and video with WebRTC, but you can also send arbitrary data. This can be used to share huge files between machines with little need for server space for example. Or it can be used to create a bittorrent like experience
  4. Privacy – since WebRTC runs direct between browsers, it is sometimes used to enhance privacy. Doing this by simply not sending the media or data via servers at all

Overview of Use-Cases with WebRTC

The use cases where WebRTC comes in handy seem endless. Every so often, I hear of a new way that WebRTC is being used to solve yet another problem.

Here are some of the main use cases you’ll find for WebRTC out there:

  • Unified communications – voice and video calling, 1:1 or group sessions
  • Contact center communications – client/agent, visual assistance, remote assistance, etc
  • Watch parties – watch television or a sports event together
  • eCommerce and retail – from one-to-one high touch sales to live broadcast for sales events and promotions
  • Telehealth, online education, legal proceedings, remote travel, fitness, dancing, tutoring, coaching, … – conduct remotely and virtually verticalized sessions you would have done in-person in the past
  • Teleoperations – drive cars, forklifts, trucks, drones, boats, submarines, … – remotely
  • Virtual and hybrid events – conduct webinars, large meetings and events online
  • Low latency broadcasting – broadcast a sports game, auction or interactive sessions to a large audience at sub second latency
  • Cloud gaming – render the visuals of a game in the cloud and send it in realtime to the player
  • Machine remoting – operate a remote machine (high performance machines or highly secured/configured machines) as if it was a local one
  • Virtual spaces and the metaverse – meet people in a synthetically rendered virtual environment in 2D or 3D

So what other choice do you really have besides using it?

The idea around WebRTC and what you can use it for are limitless. So go on – start building whatever you need!

People also ask

There are common questions people ask about WebRTC quite often. Here are my answers to them.

Is WebRTC free or paid?

WebRTC is free, but sometimes paid.

Let me explain…

WebRTC is an open protocol and has a free open source implementation. This free implementation is embedded in all modern browsers, making it free to use as a developer and a user.

The thing is, if you want to build an application with it, you will need to pay for *something* at *some point*. Meaningful applications in WebRTC require server infrastructure. This infrastructure cost to put up both in compute resources and in bandwidth resources.

You can decide to build it all from scratch on your own, or you can use third party CPaaS (communication platform as a service) vendors as a shortcut to your application development. Using a third party vendor means paying to it. Building from scratch means investing time and resources to develop and then to maintain the service (remember the infrastructure costs)?

So yes. WebRTC is free. But it costs money. I hope it makes more sense now 😉

Is WebRTC safe to use?

Yes it is. Or at least it should be.

WebRTC is safe. It got a solid security architecture. I’ve written a longform article about this if you want to dive deeper: Everything you need to know about WebRTC security 🔒

The TL;DR version is this one –

WebRTC is a modern, secure communication protocol and implementation. It was designed that way from the get go, at a time when browsers started shifting to HTTPS-first/only web. As such, it doesn’t allow for example to send media in the clear, and always encrypts the data.

Remember though that applications written using WebRTC need to take care of security and safety themselves – an application is only as secure as its weakest link, and that link isn’t going to be their WebRTC implementation.

Does WebRTC require a browser?

Nope.

WebRTC is embedded in all modern browsers today. Web developers can use the WebRTC Javascript APIs to build their applications for browser users.

Outside the browser, application developers can just take the free open source implementation of WebRTC (maintained by Google and used by all modern browsers), and compile it into their applications. Many communication applications do just that, which means that at the end of the day, WebRTC can be used everywhere and not only inside browsers.

The post What is WebRTC and What is it Good For? appeared first on BlogGeek.me.

]]>
https://bloggeek.me/what-is-webrtc/feed/ 21
WebRTC in telehealth: More than just HIPAA compliance https://bloggeek.me/webrtc-telehealth/ https://bloggeek.me/webrtc-telehealth/#comments Mon, 23 Oct 2023 10:00:00 +0000 https://bloggeek.me/?p=74011 When it comes to WebRTC in telehealth, there are quite a few use cases and a lot of things to consider besides HIPAA compliance.

The post WebRTC in telehealth: More than just HIPAA compliance appeared first on BlogGeek.me.

]]>
When it comes to WebRTC in telehealth, there are quite a few use cases and a lot of things to consider besides HIPAA compliance.

A thing that comes up in each and every discussion related to telehealth & WebRTC is the value of the call in telehealth. We’ve seen video meetings and calls go down to zero in their cost/value for the user. Especially during the pandemic. So whenever we find a nice market where there is high value for a call, it is heartening. Healthcare is such a place where we can easily explain why calls are important.

But what exactly does WebRTC in telehealth mean? It isn’t just a patient calling a doctor. There is a lot more to it than that. Let’s dive in together to see what we can find.

My own experience with Telehealth

As a user

Me and my son, waiting in a hospital while he had some blood samples taken during COVID

Like many others, my first real bump with telehealth took place during the COVID quarantines.

My son was sick with high fever for over a week, and the doctors didn’t help any.

My wife was worried, needing more comfort by knowing someone was looking at him. Really looking at him.

So we used a kind of a private service that a hospital near our vicinity was giving:

  • You subscribe and pay a hefty price
  • They send over a kit
  • You install an app and take measurements multiple times a day (useless ones, but stay with me)
  • They send over a radiologist to do an X-ray scan (need something to show they can)
  • Then you get to talk to a doctor once a day. Over a video call. From the same app

What can I say? It worked as advertised.

As a consultant and a product manager

We have quite a few healthcare clients using our various WebRTC services at testRTC.

Other than that:

  • Took part of an RFP of the ministry of health in Israel by assisting the vendor who approached me win the contract
  • I assisted vendors during the pandemic to troubleshoot their architecture and scale their service rapidly

That and just from conversations with vendors, along with a review of this article by a few who work on telehealth products and integrating their comments as well.

Does that make me an expert in telehealth? No.

But I can fill in the WebRTC angle of telehealth, which is a rather big one.

Finding WebRTC in Telehealth

Telehealth for me is about the digital transformation of healthcare services.

It can start small, with things such as scheduling and viewing lab test results. And then it can grow towards virtualizing the actual patient-doctor interaction. Or any other interaction within the healthcare space between one or more people (emphasis on one here – not two).

I’ve listed here the main use cases that came to mind thinking of it in recent days.

Patients and doctors

The most obvious use case is the patient and doctor scenario.

In this, the doctor visitation itself is remote and virtual.

This can be useful in many situations:

  • When the patient can’t get to the doctor’s office
  • During the pandemic:
    • When healthcare providers didn’t want patients physically in the office
    • If doctors are sick, but their numbers are dwindling due to them being quarantined, while they can still be useful as doctors remotely
  • If you don’t want to waste a patient’s time in coming over and waiting
  • When it is truly urgent (an emergency)

For many of these situations, this is the setup that takes place:

  1. Doctor – sitting in front of a PC or laptop. In a designated office or hospital (=managed network), or at home (=unmanaged network)
  2. Patient – connecting from a smartphone or tablet, via a direct link or an installed application

More on that – later.

In general – here’s where you’ll see such solution types deployed:

🔶 Hospitals and large healthcare organizations

🔶 Clinics hosting multiple doctors

🔶 Private clinic of a single doctor

🔶 Insurance companies

Also remember that the word doctor is a broad definition of the caretakers involved. These can be nurses, doctors, dietitians and other practitioners offering the treatment/session to the patient remotely.

The other thing to remember is that this is also asymmetric in scarcity: there are a lot more patients than they are caregivers.

Group therapy and counseling

Then there’s group therapy.

One where one or more psychologists lead a larger group of patients. The same also applies to dietitians, speech therapists, smokers, cancer patients and other groups of practitioners.

Here again, the idea and intent is that the patients and the therapists can join remotely to a virtual meeting and conduct that meeting.

The main benefit? Not needing to drive and travel for the meeting and being able to conduct it from anywhere.

Notable here is the fact that this can be enhanced or taken to a slightly different perspective – this can encompass the allied health domain, where AA (Alcoholic Anonymous) groups for example fit in.

Nurse stations

The nurse station is slightly different from the doctor-patient in my mind.

Here, the patient is situated physically next to the nurse, so the call/meeting isn’t virtual or remote but rather in person. The “twist” is that there is another caregiver or external authority that can be joined remotely to the session if and when needed. Say a doctor with a specialization that might not be available where the patient is located – this can be viewed in a way to democratize the access to specialty care.

Envision a nurse moving inside a hospital ward. She has a mobile station moving around with her that can be used to conduct video meetings with doctors. It can also be used for other purposes such as adding a live translator into that interaction with the patient or the patient’s custodian.

The lack of specialized provider access in remote areas can be extremely critical, and here again, virtual meetings can assist. Taking this further, a nurse station of sorts can be placed inside an ambulance providing immediate care – even for cases of strokes or cardiac arrests.

Outpatients

Outpatients are clinics that belong to hospitals. These are designed for people who do not require a hospital bed or an overnight stay. Sometimes, this can be for minor surgeries. Mostly for diagnostics, treatments or as follow ups to hospital admissions.

These clinics are part of the overall treatment that patients get from the hospital or for things that are hard to obtain elsewhere due to scarcity of machinery and/or experience.

Some of the diagnostics done in an outpatient clinic can be done remotely. This reduces wait times and travel times for patients and also allows using doctors joining remotely and not physically inside the clinic.

While similar to the patients and doctors use case, there are differences. The main one being the organization behind it, the logistics and the network. Hospital networks are usually a lot more complex and limited to connectivity of WebRTC traffic, bringing with it a different set of headaches.

Taking care of the elderly

As the human population is aging in general and people live longer, we’re also getting to a point where elderly care is different from other areas of healthcare. Another aspect of it is the breakdown of the family unit into smaller pieces where elderly people move to assisted living, nursing homes and hospices.

Here, the telehealth solutions seen include also things like:

  • The ability to easily communicate with family members and friends remotely to keep connected
  • Remotely monitor and take care of the old via solutions that remind us of a surveillance use case
  • Providing access to doctors remotely, especially for the less common health issues

Remote patient monitoring is another new field. Due to the scarcity of nurses, many hospitals are moving towards virtual patient monitoring for patients who are in hospitals or medical facilities that require 24×7 monitoring for critical patients.

Operating rooms

The operating room is at the heart of hospital care. It is where surgeons, anesthetics, nurses and other practitioners work together on a patient in an aseptic environment.

An obvious requirement here might be to have an expert join remotely to observe, instruct or consult during surgery. That expert can be someone who isn’t at the vicinity of the hospital, enabling to bridge the gap of knowledge and expertise existing between central hospitals in large cities to rural ones.

It can also be used to have an expert who is situated in the hospital join in – entering an operating room requires the caregiver to scrub before entering. This process takes several minutes. By having the expert join remotely from another room at the hospital, we can have him jump from one surgery to another faster. Think of the supervisor of multiple surgery rooms at a hospital or a specialist. Saving scrubbing times can increase efficiency.

Then there is the option of getting external observers into the surgery rooms without having them in the surgery room itself. They can be silent or vocal participants. Joining in as trainees for example, as part of their learning process to become surgeons.

As we advance in this area, we see AR and VR technologies enter the space, either to assist the doctor locally in the surgery or have the external experts join remotely.

Training

Learning in operating rooms is just part of training in the healthcare domain.

Training can take different shapes and sizes here, and in a way, it is also part of the education market.

Here are some of the examples I’ve seen:

  • Remote training/education for various healthcare roles
  • First aid training for civilians
  • Medical equipment training

Machinery remoting

Healthcare is a domain that has lots and lots and lots of devices and machinery. From simple thermometers to CT scanners and surgical robots.

What we are seeing in many areas is the remoting of these devices and machines. Having the patient being diagnosed or treated use a device (or have a device used on him), while having the technician, specialist, nurse or doctor operate or access the data of the device remotely.

This has many different reasons – from letting patients stay at home, to getting specialists from remote areas, to increasing the efficiency of the caregivers (reducing their travel time between visitations).

Here are a few examples:

🔶 Stethoscopes, Thermometers, Ophthalmoscopes, Otoscopes, etc. These devices can be made smart – having the patient use them on his own and have their measurements sent to remote nurses or doctors

🔶 X-ray, CT, MRI – different type of scans that can be done in one place and have the operator or the person deciphering the results located elsewhere

🔶 Surgical robots, that can be observed or operated remotely

🔶 Robots roaming hospitals, taking care of menial tasks such as sanitizing equipment and rooms

There is an ongoing increase in adding smarts into devices and the healthcare space is part of that trend. When caregivers need to interact with these devices or access their measurements in real time, this can be done using WebRTC technology.

Simultaneous translation and/or scribes

Doctors are a scarce resource. As such, a critical part is having their time better utilized.

There are two telehealth solutions that are aiming to get that done in a similar fashion but totally different focus:

Translation – patients speaking a different language than that of a caregiver need a better way to communicate. Hospitals and clinics cannot always have a translator in hand available. In such cases, having a translator join remotely can be a good solution.

The purpose? Increase accessibility of doctors to patients who don’t speak the doctor’s language.

Scribes – doctors need to keep everything documented. The patient digital record (PDR) is an important part of treatment over time. The writing part takes time and is done in parallel to diagnosing the patient. It is quite common today to have a doctor sit in front of you, typing away on his PC without even looking at the patient (being on the receiving end of that treatment more than once, it does sometimes feel somewhat surreal). Remote scribes can alleviate that by taking part in the doctor visitation, taking care of filling in the PDR. A different approach making headway here is AI-based transcription and the automatic creation of the medical record entries – this alleviates the need for a human scribe.

The purpose? Increase efficiencies and enable doctors to treat more patients.

At the boundary between education and healthcare

Then there is the education part adjacent to healthcare. Think of children who are treated for long periods of time where they either need to stay in the hospital or at home for treatment and rest. How do you make sure they don’t lose too much of the curriculum during that time? That they stay connected with their friends in class?

There are solutions for that, in the form of providing a PC at school and a tablet or laptop to the kid to remotely join such sessions.

This is probably more suitable for the education market, but I just wanted to add it here for completeness.

A game of numbers

Telehealth is a relatively small WebRTC market.

If you take all physicians in the world, and try to figure out how many there are per the size of the population, you will get averages of 1:500 at most (see Wikipedia as a source for example).

Not all physicians practice telehealth. Of those who do, many do it seldomly. The size of the number here isn’t big when it comes to minutes or visitations conducted.

Compared to the number of minutes conducted every day on Facebook Messenger, the total telehealth minutes worldwide will be miniscule.

The difference here though, is the importance and willingness to pay for each such minute.

When trying to do market sizing or value – be sure to remember this –

👉 Total number of doctors, minutes and visits isn’t that large worldwide

👉 Telehealth minutes are more valuable than social media minutes

WebRTC telehealth and HIPAA compliance

Whenever telehealth is discussed, HIPAA compliance is thrown out in the air. At its heart, HIPAA compliance is about security and privacy of patients and their information, all wrapped up in a nice certification package:

  • Vendors wanting to sell telehealth services to hospitals need to be HIPAA compliant – at least in the US
  • In the EU, there’s GDPR, with different interpretation per EU country
  • Then there are other countries outside of the US and the EU with their own regulations
  • All in all, the requirements here are quite similar

Most countries have separate regulations for patient privacy which are generally more stricter than personal privacy. While there’s more to it than what I’ll share here, it usually boils down to encryption and all the management that goes around it.

WebRTC is encrypted, so all that is left is for the application to not ruin it… which isn’t always simple.

Sometimes, you will find vendors touting E2EE (End-to-End encryption), which in most WebRTC jargon means the use of media servers who can’t access the media. Oftentimes, these vendors actually mean the use of P2P (Peer-to-Peer), where no media server is used at all.

Oh, and if you are using a third party video conferencing solution (say… a CPaaS vendor), then you will need to obtain a BAA (Business Associate Agreement) from that vendor, indicating that he complies with HIPAA. You will then need to certify your own application on top of it.

Network and firewall restrictions

Hospitals and clinics usually end up with very restrictive internet networks. This stems from the need to maintain patient confidentiality and privacy. The increase in ransomware attacks on businesses and healthcare organizations is a source of worry as well.

To such a climate, adding WebRTC telehealth solutions requires opening more IP addresses and ports on the organizations’ firewalls.

A big challenge for vendors is to get their WebRTC applications to work in certain healthcare organizations. Usually because their services get blocked or throttled by deep packet inspection.

👉 Vendors who can make this process smoother and simpler for customers will win the day.

Quality of media

Not being able to see video well in a social interaction is acceptable.

Having a doctor not being able to see the mole on your skin is a totally different thing.

Quality of media can be critical in certain use cases of telehealth. Here, it might be a matter of resolution and sharpness of the image, but it can also be related to the latency of the session. Remote procedures conducted via WebRTC for telehealth might be a bit more sensitive to latency than your common meeting scenario.

Depending upon the use case, you have to prioritize resolution vs frame rate. A still patient needs higher resolution and surgery or any motion specific activity requires a higher framerate. The ability to switch between these two priorities is also a consideration.

At times, 4K requirements or specific color spaces and audio restrictions may be needed. Especially when dealing with analysis of sensor data from medical devices. These may require a bit more work to integrate properly with WebRTC.

Asymmetric nature of users and devices

One tidbit about telehealth is that sessions are almost always asymmetric in nature and for the majority, they are going to end up as a 2-way conversation.

By asymmetric I mean that the users have different devices:

  • Doctors and caregivers will almost always be on devices that are known in advance – their location, their makeup, etc.
    • More likely, they will be accessing them from a laptop or a PC
    • They use the same application again and again. This means that they will learn to workaround issues they bump into
    • Often on a restricted device with older browser versions and/or low CPU power. Though not always and not everywhere
    • Sometimes, though less and less these days, old equipment used by doctors in their office means the introduction of interop requirements
  • Patients will almost always join from a mobile device – a tablet or a smartphone
    • Many will do so via a URL they receive over SMS, joining from a mobile browsers
    • Browser use on mobile isn’t as stable, especially on iOS Safari. Device handling is trickier with the need to handle phone calls and assistants (Siri) interacting with the same microphone
    • Others will end up on a native application built for this specific purpose
    • Being unassuming consumers, they try to join from everywhere. Including elevators or moving cars
    • They are also not going to use the application much and won’t want to waste time mucking around figuring out things or troubleshooting them. This means telehealth apps need to relentlessly focus on UX and usability for the patient side

👉 This asymmetric nature affects how telehealth applications need to be designed and built, taking special care around permissions, privacy and the unique user experience of the various users.

Medical devices, sensors and telemetry

Modern healthcare has the most variety of devices and sensors out there from all industries (leaving out the defense industry). These devices are now being digitized and modernized. Part of this modernization is adding communication channels for them, and even more recently – being able to virtualize and remote their use – either partially or fully.

Medical devices sometimes generate images. Other times an audio stream. Or a video feed. Or other sensory data and information. WebRTC enables sending such data in real time, or the telehealth application can send this data out of band, via Websockets or HTTP messages.

It can be as simple as taking a measurement of a patient remotely, while he is holding the medical device and the nurse or doctor observes him and the results sent over inside the application.

That can progress passively overseeing a procedure and commenting on it in a video session. Think of a doctor or a nurse consulting remotely with a specialist while giving a treatment or operating a surgical procedure.

And it can go to the extreme of remotely giving the procedure. A radiologist operating the CT machine remotely for example.

How these get connected and where WebRTC fits exactly is a tricky challenge. There’s latency to deal with, connectivity to physical devices, oftentimes without the ability to replace them, regulatory issues – this space has quite a few obstacles, which are also great barriers of entry and motes against competitors if one invests the effort here.

SaaS, CPaaS & open source: Build vs Buy

Telehealth comes in different shapes and sizes.

Many of the CPaaS vendors have gone ahead and made themselves easy to use for telehealth, mainly by supporting HIPAA compliance requirements.

I’ve seen various telehealth solutions built on CPaaS while others build their own service from scratch using open source components. There is no single approach here that I can suggest, as each has its own advantages and challenges.

One of the biggest challenges in adopting CPaaS for telehealth is upholding the patient’s privacy. Functions of the CPaaS platform require it to know certain elements of PHI (Personal Health Information), especially if call recordings are implemented. At times, a telehealth platform may expose a patient name or other information to the CPaaS implementation. These invite additional security risks and may violate patient privacy laws. A BAA here helps, but may not be enough, since most patient privacy laws require to expose only the bare minimum that is needed to an external entity (in this case, the CPaaS vendor) when it comes to PHI.

Here. vendors should look at their core competencies and the actual requirements they have from their WebRTC infrastructure. And as always, my suggestion is to go with CPaaS unless there is a real reason not to.

🔸🔹🔸🔹🔸

Where can I help, if at all?

🎯 Online WebRTC courses, to skill up engineers on this technology

🎯 Consulting, mostly around architecture decisions and technology stack selection

🎯 Testing and monitoring WebRTC systems, via my role as Senior Director at Cyara (and the co-founder of testRTC)

The post WebRTC in telehealth: More than just HIPAA compliance appeared first on BlogGeek.me.

]]>
https://bloggeek.me/webrtc-telehealth/feed/ 2
Fitting WebRTC in the brave new world of webcams, security, surveillance and visual intelligence https://bloggeek.me/webrtc-webcams-security-surveillance-visual-intelligence/ https://bloggeek.me/webrtc-webcams-security-surveillance-visual-intelligence/#respond Tue, 26 Sep 2023 09:30:00 +0000 https://bloggeek.me/?p=73973 WebRTC has its place in surveillance and security applications. It isn’t core to these industries, but it is critical in many deployments.

The post Fitting WebRTC in the brave new world of webcams, security, surveillance and visual intelligence appeared first on BlogGeek.me.

]]>
WebRTC has its place in surveillance and security applications. It isn’t core to these industries, but it is critical in many deployments.

Surveillance has become near and dear to my heart. I had a few vendors consult with me in the past. There are a few using testRTC. And then there’s the personal level. The system we have in our apartment building.

This got me to think quite a lot about WebRTC in surveillance tech lately.

Why my interest in surveillance cameras (and WebRTC)?

I live in an apartment building here in Israel:

🏢 23 floors

🤼 91 apartments

🚪 2 main entrances (and another side one)

🛗 3 elevators

🅿️ 3 levels of underground parking

And yes. We have a surveillance camera system. Like all of the other apartment buildings in my neighborhood:

The view from my apartment on a nice day

A year ago, I was in charge of the vendor selection and upgrade process of our cameras. We switched from an analog system into a hybrid analog/IP one.

This month, we’re looking into upgrading an elevator camera to an IP one, as well as adding WiFi to our underground parking. Having a chat with one of the vendors we’re reaching out to, he was fascinated with my work on WebRTC and the potential of using it for application-less viewing of cameras.

I’ve had my share of meetings and dealings with vendors building different types of surveillance and security solutions. From private security solutions to large scale, enterprise visual intelligence ones. Obviously, the matter of these interactions were around WebRTC.

⏩ I am not an expert in surveillance, so take the market overview with a grain of salt

⏩ That said, I do know my way with WebRTC and where it fits nicely

⏩ Here are some of the things I learned over the years

Security and surveillance use cases in WebRTC

I’ll start with the obvious – cameras, security and surveillance have multiple use cases. Some of them can be seen as classic to this domain while others slightly newer or a specialized niche. Each of these use cases is a world onto its own with its requirements from WebRTC and the types of solutions emerging in it.

Small scale / cheap multiple surveillance cameras

This is where I’d frame my own experience of our apartment building. A system that requires 32 or less video cameras, spread across the location, connected to a DVR (Digital Video Recorder) or an NVR (Network Video Recorder).

In essence, you go install the cameras in sensitive locations, wire them up (with an analog cable, IP or even wireless) to the media server that is located onsite as well. That media server is a DVR if it is a closed loop system or an NVR if you’re living in modern times. I’ll just refer to these two as xVR from here on.

Once there, you hook’em up to a local monitor that nobody goes and look at, as well as let the owner connect remotely from his PC or mobile phone.

Is WebRTC needed here? Not really.

Surveillance cameras today use RTP (and sometimes also RTSP). These are the new ones. Old ones are pure analog. They connect to that xVR media server, which handles them quite well today. It did so also before WebRTC came to our lives. The user then accesses the system to play the videos remotely using a dedicated application, which again, existed before WebRTC.

Since there’s no specific requirement to access this through a web browser, the use of WebRTC here is questionable.

You might say WebRTC would make things easier, but hey – if it ain’t broken, don’t fix it

These solutions are purchased from local vendors that install such systems. The buyer will usually reach out to an installer that will pick and choose the cameras and the surveillance system for the buyer. The buyer cares less about the technology and more about the local vendor’s ability to install and maintain the system when needed.

Enterprise / large scale surveillance

Large scale surveillance systems for enterprises is more of the same as the small scale ones, but with a few main differences:

  1. There are more cameras
  2. There are also more sensors which we want to control and manage, likely using the same system. Think doors and managing employee entrance using keycards for example. While this is about surveillance and security, it is also about building automation
  3. This can go from a small scale building to as large as smart cities with lots of cameras – anywhere in-between that I bunch here are most likely multiple different markets with slightly different requirements
  4. We are likely to have a NOC, where security guards look at screens. Just like in the movies…

The two things that are making headways in this industry?

  • Using AI to reduce the amount of people needed to look at surveillance monitors. This is done by adding vision smarts into cameras and the media servers (local or in the cloud), so that events and alerts can be filtered better
  • To some extent, there’s also a requirement to use WebRTC in the NOC to be able to view in real time camera feeds without installing anything

Like the small scale solutions, here too the buyer will look for local installers. These will be the local integrators who bring the systems and install them. At times, the decision of brand will come from the buyer, though this is less likely. It is important to remember that a considerable part of the cost goes towards the setup and installation and not necessarily to the cost of the equipment itself.

Personal/home surveillance

This one is the residential one. It is a B2C space where the buyer is a person buying a camera for his own home security. The decision is made on price or brand mostly.

Here you’ll find also solutions that make use of old smartphones and tablets as cameras, or something like the one we purchased a few years back when our kids were younger:

A digital peephole camera

Having the ability for them to see who is outside our door when they were shorter.

Here too, the market is going into multiple directions:

  • Home automation, connecting more sensors and devices in the home, some of them have cameras in them
  • Surveillance and security, where today it seems at least here in Israel, that fingerprint door locks are all the rage

Where does WebRTC play here? It might make things smoother to develop for the companies, but this doesn’t seem to be the case.

One thing that goes through all use cases above, is the existence of another solution – the video doorbell. Taken into buildings, this becomes an intercom system, which again – can make use of WebRTC. And why? Because it needs bidirectional support for audio at the very least, making WebRTC a suitable alternative.

Personal security

A totally different niche is the one of personal security.

This manifests itself as apps (and services) people can use to increase their security while going about in their daily tasks. Some of these apps connect you to friends and family while others to personal security agents. The WebRTC requirement here is the same for all cases – be able to conduct voice and video calls in real time.

Taken more broadly from the personal level, the same can be implemented in campuses, cities, events, etc.

Unique (?) challenges for WebRTC with camera hardware

There are some unique challenges for WebRTC when it comes to the surveillance space, and that’s mostly a matter of hardware.

  • Costs
    • Hardware costs money. Not just the devices themselves, but their installation. This also means that hardware costs needs to be kept low in most of these systems, which means less processing power available on the cameras themselves or the xDR devices
    • To drive costs down, CPUs won’t be as performant as the ones found in smartphones or PCs for example, and they would almost always rely heavily on hardware video encoding
  • Maintenance
    • Many of these hardware systems come without subscription services. This means any firmware upgrades might or might not be available. It also means that such upgrades are sometimes clunky to get done on the devices, especially when they need to be handled remotely
    • There’s a lot of physical maintenance as well involved. Cleaning lenses of cameras for example
  • Technology leaps
    • You purchase a system. It has cameras and a xDR. Time passes. A couple of years. You decide you need more cameras, replace an existing one, whatever
    • There’s improvements that took place. The system you have might not even be able to deal with the new cameras available today, and purchasing old ones might not be possible or economical anymore
    • We had this when the system in our residential building broke. The DVR had a hard drive malfunction – it didn’t record anything anymore
      • It was impossible to replace, and buying a new old system wouldn’t be the right approach
      • Some of the cameras lost quality due to their analog coax cables (I was told this is an issue), and the predicament was we’d lose more of these cables in the coming 2-5 years anyways
      • So we had to shift the whole system to an IP based one. A technology leap
      • While I don’t foresee a move away from IP, I am sure many of these systems will change in the coming years in ways that would leave some of the old hardware unusable
  • Hybrid
    • There are hybrid alternatives in this space. We ended up getting one for our building
    • Due to the technology leaps, you end up with multiple types of sensors and cameras, from different generations and technologies
    • The systems that cobbles it all together (the xDR in our case), can be one that manages them all
    • Most installers won’t recommend it. It is mostly a necessary evil. Likely because it reduces the revenue of the installer and adds to the complexity of the installation and the system

Most of these issues won’t plague a software solution. But here, we end up in the real world simply because someone needs to go and install the physical cameras.

👉 When figuring out the hardware platform to use, it is important to think of future trends and technology improvements that affect your implementation

👉 In the case of surveillance, there’s WebRTC, future video codecs (AV1) and machine learning in the vision domain to think about. Probably also programmable photography that is bringing innovations to smartphones for a few years now

Ingress, egress and the concept of real time

Where to place WebRTC in the solution?

Since I write a lot about WebRTC, and this article is mostly about WebRTC in surveillance markets, it is THE biggest question to answer here.

There are two different places, and both are suitable, but not necessarily together in the same system.

Surveillance needs real time. Sometimes.

Egress

In our own residential building, I seldom care about the live feed from the cameras. It is to check if the front door to the building is open or not, or if there’s some area that got dirty (usually dog pee). Then most of the time is spent rewinding to figure out who caused the problem. Nothing here is considered real time in nature or requires sub second latency.

Elsewhere, real time might be critical on the viewer side (egress), which brings with it the question of whether WebRTC fits here well.

Ingress

Web cameras that directly stream out WebRTC to the world (or the xDR). Is that a benefit? What’s the value of it versus the existing camera technologies used?

I am not quite for or against this, as I am not really sure here. I’d say that a benefit here can be in the fact that it makes the whole technology stack simpler if you end up using WebRTC end-to-end instead of needing to switch protocols from the camera to the viewer. Just remember here that rewind and playback will likely require something other than WebRTC.

The main advantage of WebRTC here might be the removal of the need to transcode and translate across protocols and codecs. It makes xDR software simpler to write and reduces a lot of their CPU requirements, making the systems lighter and cheaper (the xDR – not the camera itself).

One more thing to think of is cameras that also require bidirectional audio. Because a security guard wants to announce or warn perpetrators, or because this is a video doorbell. There, WebRTC fits nicely, though again – not mandatory (I’d still try using it there more than elsewhere).

👉  Going to introduce WebRTC to a surveillance system? Great. Check first where exactly within the whole architecture WebRTC fits and ask yourself why

Mobile or desktop?

Another important aspect of a surveillance system is where people go to watch the videos.

When we installed our own system, we were told that the mobile app is better than the PC app. In both, these were applications. But somehow for the consumers, it meant using the smartphone. It sucks. But yes – it sucks more on the desktop. Which is crazy, considering that what you’re trying to do is watch output coming from 4K cameras in order to identify people.

Then again, who is your customer?

If this is a large enterprise, where there’s going to be a fancy video wall of video feeds with a bored security guard looking at it, then should this be an application or would it be preferable to use a web application for it, with the help of WebRTC? It seems that much of the industry on the client side is looking for lightweight solutions that require less software installations, favoring browsers and… WebRTC.

And if you’re already doing WebRTC for one egress destination, you can use it for all others – browser and app based.

One more thing to consider – it is easier today to develop a web application than it is a native PC application. Cheaper and faster. Which means that supporting WebRTC if the desktop is your primary viewing device might be the right decision to make.

👉 See if there’s a strong need for a zero-install or desktop viewing. This might well lead you towards WebRTC on the egress side

The age of Artificial Intelligence in surveillance tech

The biggest driver in this industry is machine learning and artificial intelligence. And not necessarily the Generative AI kind, but rather the kind that deals with object classification.

The challenge with surveillance is watching the damn cameras. You need eyeballs on screens. The good old motion detection removes a lot of noise (or more accurately, static), but it leaves much to be desired.

One of the elevators in my building, along with the video you get most hours of the day – empty. The bar at the bottom with the blue stripes marks when there’s actual movement.

Using machine learning, it will be easier to search for dogs, people, colors, items and other tidbits to figure out times of interest in the thousands of hours of boring videos, as well as act as “Google search” on recorded video feeds.

Doing all that in the cloud is possible, but expensive and tedious – how do you ship all the video, decode it, process it again, etc.

Doing it on the edge, on the device itself (the camera or the xDR) is preferable, but requires new hardware, so requires another technology leap and refresh.

WebRTC isn’t core for surveillance but it is critical

This is something to remember.

WebRTC isn’t core to surveillance. You don’t really need it to get surveillance cameras working, installed or connected to their xDR media servers. You don’t even need it to view videos – either “live” or as playback.

But, and that’s a big one – in some cases, having WebRTC is critical. Because your customer may want to be able to use web browsers and install nothing. He may want to be able to get bidirectional media. There might be a need to get video feeds that are at sub second latencies.

For these, WebRTC might not be a core competency, but they are critical to the successful delivery and deployment of your product. This translates into having a need to have that skill set in your team or be able to outsource it to someone with that skill set.

🔸🔹🔸🔹🔸

Where can I help, if at all?

🎯 Online WebRTC courses, to skill up engineers on this technology

🎯 Consulting, mostly around architecture decisions and technology stack selection🎯 Testing and monitoring WebRTC systems, via my role as Senior Director at Cyara (and the co-founder of testRTC)

The post Fitting WebRTC in the brave new world of webcams, security, surveillance and visual intelligence appeared first on BlogGeek.me.

]]>
https://bloggeek.me/webrtc-webcams-security-surveillance-visual-intelligence/feed/ 0
Solving CPaaS vendor lock-in (as a customer and as a CPaaS vendor) https://bloggeek.me/solving-cpaas-vendor-lockin/ https://bloggeek.me/solving-cpaas-vendor-lockin/#respond Tue, 12 Sep 2023 09:30:00 +0000 https://bloggeek.me/?p=73957 How to think and plan for CPaaS vendor lock-in when it comes to your WebRTC application implementation.

The post Solving CPaaS vendor lock-in (as a customer and as a CPaaS vendor) appeared first on BlogGeek.me.

]]>
How to think and plan for CPaaS vendor lock-in when it comes to your WebRTC application implementation.

How can/should CPaaS vendors compete on winning customers? More than that, how can/should CPaaS vendors poach customers from other CPaaS vendors?

What prompted this article is the various techniques CPaaS vendors use and what they mean to customers – how should customers react to these techniques. I’ll focus on the Video API part of CPaaS – or to be more specific, the part that deals with WebRTC implementation.

What is CPaaS vendor lock-in?

For me CPaaS (or Communication Platform as a Service) is a service that lets companies build their own communication experiences in a flexible manner. Usually done via APIs and requires developers, but recently, also via lowcode/nocode interactions (such as embedding an iframe).

A CPaaS vendor ends up defining its own interface of APIs which his customers are using to create these communication experiences.

That API interface is proprietary. There is no standard specification for how CPaaS APIs need to look or behave. This means that if you used such an API, and you want to switch to another CPaaS vendor – you’re going to need to do all that integration work all over again.

Think of it like switching from an Android phone to an iPhone or vice versa:

  • There’s a new interface you need to learn
    • It might be similar since it practically used for doing the same things
    • But it is also a bit “off”. The things you expect to be in one place are in another place
    • The settings is done differently
    • And the way you deal with the phone’s assistant (or Siri?) is different as well
  • You need to install all of your apps from scratch
    • Find them in the app store, download them, install them
    • Set them up by logging in
    • Some of them you need to purchase separately all over again
    • Others you won’t find… and you’ll need to look for alternative apps instead – or decide not to use that functionality any longer
  • The behavior will be different
    • The background color of the apps
    • They way you switch between screens is different
    • The swipe “language” is also slightly different

In a way, you want the same experience (only better), but there’s going to be a learning curve and an adaptation curve where you familiarize yourself with the new CPaaS vendor and “make yourself at home”.

The vendor lock-in part is how much effort and risk will you need to invest and overcome in order to switch from one vendor to another – to call that other vendor your new home.

Vendor lock-in has 3 aspects to it in CPaaS:

  1. Difference in the API interface. That’s a purely technical one. Low risk usually, with varying degree of effort
  2. Behavioral differences. This has higher risk with unknown effort involved. While both CPaaS vendors do the “same” thing, they are doing it differently. And that difference is hidden behind how they behave. Your own application may rely on behavior that isn’t part of the standardized official interface and you will find out about it only once you test the migrated application on the new CPaaS vendor’s interface or later when things break in production
  3. Integration differences. There are things outside the official interface you might have integrated with such as logs collection, understanding and handling error codes and edge cases, ETL processes, security mechanisms, etc. These things are the ones developers usually won’t account for when estimating the effort in the beginning and will likely be caught late in the migration process itself

Vendor lock-in is scary. Not because of the technical effort involved but because of the risks from the unknowns. The more years and the more interfaces, scenarios and code you have running on a CPaaS vendor, the higher the lock-in and risk of migration you are at.

The innovation in WebRTC that CPaaS is “killing”

Before WebRTC, we had other standards. RTP and RTCP came a lot before WebRTC.

We had RTMP, RTSP, SIP and H.323.

The main theme of all these standard specifications was that their focus has always been about standardizing what goes on over the network. They didn’t care or fret about the interface for the developer. The idea behind this was to enable using this standard on whatever hardware, operating system and programming language. Just read the spec and implement it anyway you like.

WebRTC changed all that (ignoring Flash here). We now have a specification where the API interface for the developer of a web application is also predefined.

WebRTC specifies what goes on the network, but also the JavaScript API in web browsers.

Here’s how I like explaining it in my slides:

One of the main advantages of WebRTC is that a developer who uses WebRTC in one project for one company can relatively easily switch to implement a different WebRTC project for another company. (that’s not really correct, but bear with me a little here)

We now could think of WebRTC just like other technologies – someone proficient in WebRTC is “comparable” to someone who worked with Node.js or SQL or other technologies. Whereas working with SIP or H.323 begs the question – which framework or implementation was used – learning a new one has its own learning curve.

Enter CPaaS…

And now the WebRTC API interface is no longer relevant. The CPaaS vendor’s SDK has its own interface indicating how things get done. And these may or may not bear any resemblance to the WebRTC API. Moreover – it might even try very hard to hide the WebRTC stack implementation from the developer.

This piece of innovation, where a developer using WebRTC can jump into new code of another project quickly is gone now. Because the interfaces of different CPaaS vendors aren’t standardized and don’t adhere to the standard WebRTC API interface (and they shouldn’t be – it isn’t because they are mean – it is because they offer a higher level of abstraction with more complex and complete functionality).

Not having the same interface across CPaaS vendors is one of the reasons we’ve started down this rabbit hole of exploring what CPaaS vendor lock-in is exactly.

CPaaS vendor poaching techniques and how to react to them

Every so often, you see one or more CPaaS vendors trying to grab a bit more market share in this space. Sometimes, it is about enticing customers who want to start using a CPaaS vendor. Other times it is focused on trying to poach customers from other CPaaS vendors.

When looking at the latter, here are the CPaaS vendor poaching techniques I’ve seen, how effective they are, and what you as a target company should think about them.

#1 – Feature list comparisons

The easiest technique to implement (and to review) is the feature list comparison.

In it, a CPaaS vendor would simply generate and share a comparison table of how its feature set is preferable over the popular alternatives.

For a company looking to switch, this would be a great place to start. You can skim through the feature list and see exactly what’s there in the platform you are currently using and the one you are thinking of switching to.

When looking at such a list, remember and ask yourself the following questions:

  • Is this list up to date? Oftentimes, these pages are created with big fanfare when a “poaching” or comparison project is initiated by the marketing department of a CPaaS vendor. But once done, it is seldom updated to reflect the latest versions (especially the latest version of the competitor). So take the comparison with a grain of salt. It is likely to be somewhat incorrect
  • Check what your experience is with the vendor you are using versus how it is reflected in the comparison table. Does the table describe things as you see them?
  • The features that look better “on paper” in this table for the vendor you plan on switching to. Do you need these features? Are they critical for you today or in the near future? Or are they just nice to have
  • The “greens” on the vendor making the comparison – are they on par with the other vendor or just a less comprehensive implementation of it? (for example, support for group calls – both vendors may support it, but one can get you to X users with open mics in a group call while the other can do 10X users)

👉 I’ve had my fare share of reading, writing and responding to comparison tables. A long time ago (pre-WebRTC), we received inputs that our competitor can do almost 10 times the number of concurrent calls we are able to do with much higher throughput. Obviously, we created a task force to deal with it. The conclusion was simple – the competitor didn’t measure the network time at all – just CPU time in the machine. We weren’t measuring the same thing and his choice of metric meant he always looked better

👉 Your role in this? To read between the lines and understand what wasn’t written. Always remember that this isn’t an objective comparison – it is highly skewed towards the author of it (otherwise, he wouldn’t be publishing it)

#2 – Performance comparisons

Here the intent of the CPaaS vendor is to show that his platform is superior in its performance. It can offer better quality, at lower bitrates and CPU use for larger groups.

If a vendor does it on his own, then potential customers will immediately view the results as suspect. This is why most of them use third party objective vendors to do these performance comparisons for them (at a cost).

We’ve done this at testRTC a couple of times – some publicly shared (for this one, I’ve placed my own reputation and testRTC’s reputation on the frontline, insisting not to name the other vendors) and others privately done. It is a fun project since it requires working towards a goal of figuring out how different CPaaS vendors behave in different scenarios.

Zoom did this as well, comparing itself to other CPaaS vendors. Agora answered in kind with a series of posts comparing themselves back to Zoom (where Zoom didn’t look as shiny).

Just remember a few things when reading such comparisons:

  • They were commissioned. They wouldn’t be published and shared if they weren’t showing what the CPaaS vendor wanted them to show
  • For me, it is more interesting to see how the setup of the performance tests was done and what was left out or missed in the comparison to begin with
    • The types of machines and browsers selected
    • Scenarios picked
    • Reference applications used for each vendor
    • How measurements are done
    • Which metrics are selected for the comparison
  • Who the vendor was looking to compare himself to
  • The CPaaS vendor usually helps and tweaks his own platform to fit the scenarios selected, while the competing vendors have no say in which of their applications or samples are used and if or how they are optimized for the scenario (hint: they aren’t)

👉 In the end, the fact that a CPaaS vendor performs better than another in a scenario you don’t need says nothing for you. Make sure to give more weight to the results of actual scenarios relevant to you, and be sure you understand what is really being compared

#3 – Guides, how-to’s and success stories

How do you make the migration of a customer from a different CPaaS vendor to your own? You write a migration document about it. A guide. Or a how-to. Or you get a testimonial or a success story from a customer willing to share publicly that he migrated and how life is so much better for him now.

These are mainly targeted at raising the confidence level for those who are contemplating switching, signaling them that the process isn’t risky and that others have taken this path successfully already.

As someone thinking of moving from one vendor to another, I’d seriously consider reaching out to the CPaaS vendor and ask the hard questions:

  • How easy is the migration really is
  • What challenges should one expect
  • Are there any common issues that migrating customers have bumped into
  • How many such customers do they have
  • Can they reach out and ask one of those who migrated to have a quick direct conversation with

Anecdotes and recipes are nice. What you are after is having more data points.

👉 Read these guides and success stories. Try reading between the lines in them. Check if you have any open questions and then ask these questions directly. Gather as much information as you can to get a clearer picture

#4 – Reference applications

I wasn’t sure if this fits for migrating customers because it is a bit broader in nature. But here we are 😎

In many cases, CPaaS vendors have reference applications available. Usually hosted on github. Just pull the code, compile, host and run it. You get an app that is “almost” ready for deployment.

You see how easy that was? Think how easy it is going to be to migrate to us with this great reference.

Remember a few things here:

  • Your workflow is likely different enough from the reference app that there’s work to be done here
  • In most cases, if you’ve built your application already on another vendor, using a reference app of another CPaaS vendor is close to impossible
  • Reference apps are just references. They usually don’t cover many of the edge cases that needs handling

👉 From my point of view, reference apps are nice to get a taste of what’s possible and how the API of a CPaaS vendor gets used. But that’s about it. They are unlikely to be useful during the migration process itself

#5 – Shims and adaptors

They say imitation is the highest form of flattery. If that is true, then shims and adapters would fit well here.

In CPaaS, the most common one was supporting TwiML (that’s Twilio’s XML “language” for actions on telephony events). There’s also the idea/intent of having the whole API interface of another CPaaS vendor (or parts of it) supported directly by the poacher. The purpose of which is to make it easy to switch over.

Clearing things up a bit:

  • CPaaS vendor A has an API interface
  • CPaaS vendor B has a different API interface
  • To make it easier to switch from vendor A to vendor B, vendor B decides to create a piece of software that translates calls of A’s API interface to B’s API interface. This is usually called a shim or an adaptor

The result? If you’re using vendor A, theoretically, you can take the shim created by vendor B and magically without any investment, you migrate to vendor B. Problem solved 😎

While this looks great on paper, I am afraid it has little chance of holding up in the real world 🥸. Here’s why:

  1. The shim created is usually partial. Especially if vendor A offers a very rich interface (most vendors will, especially in the domain of video APIs and WebRTC)
  2. Like reference applications, these shims don’t take good care of edge cases. Why? Because they aren’t used by many customers ➡️ less customers = less investment
  3. WebRTC is rather new, and CPaaS vendors have much to add, so every time vendor A updates his CPaaS and adds APIs to the interface – vendor B needs to invest in updating the shim. But is that even done once a shim is created? Or is it again, placed in the afterburner due the previous rule ➡️ less customers = less investment
  4. Behavior. Same API interface doesn’t necessarily mean the vendor’s platforms behave the same on the network. These changes are hard to catch… and might be even harder to resolve
  5. Using a shim is nice, but if you want to use specific features available in vendor B’s interface – can you even do that if you’re doing everything via the shim? And is that the correct way to do things moving forward for you?

The thing is, that using a shim still means a ton of testing and headaches, but such that are hard to overcome.

If I had to switch between vendors, I’d ignore such shims altogether. For me they’re more of a trap than anything else.

👉 Someone suggesting you use their shim for switching over to their CPaaS? Ignore them and just analyze what needs to be done as if there’s no shim available. You’ll thank me later

Build vs Buy – my first preference is ALWAYS buy (=CPaaS)

We’ve seen 5 different techniques CPaaS vendors use to try and poach customers from one another. For the most part, they are of the type of “buyers beware”. And yet, we do need to migrate from time to time from one CPaaS vendor to another. Market dynamics might force us to do so or just the need to switch to a better platform or offering.

Does that mean it would be best to go it alone and build your own platform instead of using a third party CPaaS vendor?

No.

Vendor lock-in isn’t necessarily a bad thing. My first preference is always to adopt a CPaaS vendor. And if not to adopt one, then to articulate very clearly why the decision to build is made.

What should you do when you start using a CPaaS vendor to make the transition to another vendor (or to your own platform) smoother in the distant future? Here are a few things to consider.

  1. Limit the calls to the vendor’s API interface
    • If you can make all of them from a single source file then great
    • Even if not it is fine, but try not to call the vendor’s APIs and use their objects directly all over the place
    • Having it all nicely compartmentalized will reduce the amount of changes needed during a migration
  2. Consider building an abstraction layer
    • While I hate this one, it appeals to some
    • Create your own abstraction of the communications capabilities you need
    • Have that abstraction a “standardized” internal interface you follow
    • Implement the integration with the vendor as a class/object of that interface
    • This enables you to implement the next vendor or your own platform as yet another class/object for the same interface in the future at some point.
    • Risky. As this probably will require architectural and design changes once that time comes, but it might still be the decision that can get your company to move forward
  3. Don’t use undocumented APIs and behaviors
    • These will be harder to figure out in the future
    • Making them harder to modify during a migration
  4. Assume there’s no simple solution
    • No silver bullet or magic solution here
    • Which means that time invested in catering for future multiple vendors or seamless migration paths is time wasted
    • Try to make the decisions here ones that don’t take more resources or time today due to some unknown future need – you are more likely to make a mistake in these decisions than you are to succeed in it

The post Solving CPaaS vendor lock-in (as a customer and as a CPaaS vendor) appeared first on BlogGeek.me.

]]>
https://bloggeek.me/solving-cpaas-vendor-lockin/feed/ 0
Cloud gaming, virtual desktops and WebRTC https://bloggeek.me/cloud-gaming-virtual-desktops-and-webrtc/ https://bloggeek.me/cloud-gaming-virtual-desktops-and-webrtc/#respond Mon, 03 Jul 2023 10:30:00 +0000 https://bloggeek.me/?p=73861 WebRTC is an important technology for cloud gaming and virtual desktop type use cases. Here are the reasons and the challenges associated with it.

The post Cloud gaming, virtual desktops and WebRTC appeared first on BlogGeek.me.

]]>
WebRTC is an important technology for cloud gaming and virtual desktop type use cases. Here are the reasons and the challenges associated with it.

Google launched and shut down Stadia. A cloud gaming platform. It used WebRTC (yay), but it didn’t quite fit into Google’s future it seems.

That said, it does shed a light on a use case that I’ve been “neglecting” in my writing here, though it was and is definitely top of mind in discussions with vendors and developers.

What I want to put in writing this time is cloud gaming as a concept, and then alongside it, all virtual desktops and cloud rendering use cases.

Let’s dig in 👇

The rise and (predictable?) fall of Google Stadia

Google Stadia started life as Project Stream inside Google.

Technically, it made perfect sense. But at least in hindsight, the business plan wasn’t really there. Google is far remote from gaming, game developers and gamers.

On the technical side, the intent was to run high end games on cloud machines that would render the game and then have someone play the game “remotely”. The user gets a live video rendering of the game and sends back console signals. This meant games could be as complex as they need be and get their compute power from cloud servers, while keeping the user’s device at the same spec no matter the game.

Source: Google

I’ve added the WebRTC text on the diagram from Google – WebRTC was called upon so that the player could use a modern browser to play the game. No installation needed. This can work nicely even on iOS devices, where Apple is adamant about their part of the revenue sharing on anything that goes through the app store.

Stadia wanted to solve quite a few technological challenges:

  • Running high end console games on cloud machines
  • Remotely serving these games in real time
  • Playing the game inside a browser (or an equivalent)

And likely quite a few other challenges as well (scaling this whole thing and figuring out how to obtain and keep so many GPUs for example).

Technically, Stadia was a success. Businesswise… well… it shut down a little over 3 years after its launch – so not so much.

What Stadia did though, was show that this is most definitely possible.

WebRTC, Cloud gaming and the challenges of real time

To get cloud gaming right, Google had to do a few things with WebRTC. Things they haven’t really needed too much when the main thing for WebRTC at Google was Google Meet. These were lowering the latency, dealing with a larger color space and aiming for 4K resolution at 60 fps. What they got virtually for “free” with WebRTC was its data channel – the means to send game controller signals quickly from the player to the gaming machine in the cloud.

Lets see what it meant to add the other three things:

4K resolution at 60 fps

Google aimed for high end games, which meant higher resolutions and frame rates.

WebRTC is/was great for video conferencing resolutions. VGA, 720p and even 1080p. 4K was another jump up that scale. It requires more CPU and more bandwidth.

Luckily, for cloud gaming, the browser only needs to decode the video and not encode it. Which meant the real issue, besides making sure the browser can actually decode 4K resolutions efficiently, was to conduct efficient bandwidth estimation.

As an algorithm, bandwidth estimation is finely tuned and optimized for given scenarios. 4K and cloud gaming being a new scenario, meant that bitrates that were needed weren’t 2mbps or even 4mbps but rather more in the range of 10-35mbps.

The built-in bandwidth estimator in WebRTC can’t handle this… but the one Google built for the Stadia servers can. On the technical side, this was made possible by Google relying on sender-side bandwidth estimation techniques using transport-cc.

Lower latency: playout delay

Remember this diagram?

It can be found in my article titled With media delivery, you can optimize for quality or latency. Not both.

WebRTC is designed and built for lower latency, but in the sub-second latency, how would you sort the latency requirements of these 3 activities?

  1. Nailing a SpaceX rocket landing
  2. Playing a first shooter game (as old as I am, that means Doom or Quake for me)
  3. Having an online meeting with a potential customer

WebRTC’s main focus over the years has been online meetings. This means having 100 milliseconds or 200 milliseconds delay would be just fine.

With an online game? 100 milliseconds is the difference between winning and losing.

So Google tried to reduce latency even further with WebRTC by adding a concept of Playout Delay. The intent here is to let WebRTC know that the application and use case prefers playing out the media earlier and sacrificing even further in quality, versus waiting a bit for the benefit of maybe getting better quality.

Larger color space

Video conferencing and talking heads doesn’t need much. If you recall, with video compression what we’re after is to lose as much as we can out of the original video signal and then compress. The idea here is that whatever the eye won’t notice – we can make do without.

Apparently, for talking heads we can lose more of the “color” and still be happy versus doing something similar for an online game.

To make a point, if you’ve watched Game of Thrones at home, then you may remember the botch they had with the last season with some of the episodes that ended up being too dark for television. That was due to compression done by service providers…

While different from the color space issue here, it goes to show that how you treat color in video encoding matters. And it differs from one scenario to another.

When it comes to games, a different treatment of color space was needed. Specifically, moving from SDR to HDR, adding an RTP header extension in the process to express that additional information.

Oh, and if you want to learn more about these changes (especially resolution and color space), then make sure to watch this Kranky Geek session by YouTube about the changes they had to make to support Stadia:

What’s in cloud gaming anyway?

Here’s the thing. Google Stadia is one end of the spectrum in gaming and in cloud gaming.

Throughout the years, I’ve seen quite a few other reasons and market targets for cloud gaming.

Types of cloud games

Here are the ones that come out of the top of my head:

  • High end gaming. That’s the Google Stadia use case. Play a high end game anywhere you want on any kind of device. This reduces the reliance and need to upgrade your gaming hardware all the time
    • You’ll find NVIDIA, Amazon Luna and Microsoft xCloud focused in this domain
    • How popular/profitable this is is still questionable
  • Console gaming. PlayStation, Xbox, Switch. Whatever. Picking a game and playing without waiting to download and install is great. It also allows reducing/removing the hard drive from these devices (or shrinking them in size)
  • Mobile games. You can now sample mobile apps and games before downloading them, running them in the cloud. Other things here? You could play games of other users 🤔 using their account and the levels they reached instead of slaving your way there
  • Retro/emulated games. There’s a large and growing body of games that can’t be played on today’s machines because the hardware for them is too old. These can be emulated, and now also played remotely as cloud games. How about playing a PlayStation 2 game? Or an old and classing SEGA arcade game? Me thinking Golden Axe

Improved gameplay

Why not even play these games with others remotely?

My son recently had a sit down with 4 other friends, all playing on Xbox together a TMNT game. It was great having them all over, but you could do it remotely as well. If the game doesn’t offer remote players, by pushing it to the cloud you can get that feature simply because all users immediately become remote players.

At this stage, you can even add a voice conference or a video call to the game between the players. Just to give them the level of collaboration they can get out of playing the likes of Fortnite. Granted, this requires more than just game rendering in the cloud, but it is possible and I do see it happen with some of the vendors in this space.

Beyond cloud gaming – virtual desktop, remote desktop and cloud rendering

Lower latencies. Bigger color space. Higher resolutions. Rendering in the cloud and consuming remotely.

All these aren’t specific to cloud gaming. They can easily be extended to virtual desktop and remote desktop scenarios.

You have a machine in the cloud – big or small or even a cluster. That “machine” handles computations and ends up rendering the result to a virtual display. You then grab that display and send it to a remote user.

One use case can just be a remote desktop a-la VNC. Here we’re actually trying to get connected from one machine to another, usually in a private and secure peer-to-peer fashion, which is different from what I am aiming for here.

Another, less talked about is doing things like Photoshop operations in the cloud. For the poor sad people like me who don’t have the latest Mac Pro with the shiny M2 Ultra chip, I might just want to “rent” the compute power online for my image or video editing jobs.

I might want to open a rendered 3D view of a sports car I’d like to buy, directly from the browser, having the ability to move my view around the car.

Or it might just be a simple VDI scenario, where the company (usually a large financial institute, but not only) would like the employees to work on Chromebook machines but have nothing installed or stored in them – all consumed by accessing the actual machine and data in their own corporate data center or secure cloud environment.

A good friend of mine asked me what PC to buy for himself. He needed it for work. He is a lawyer. My answer was the lowest end machine you can find would do the job. That saved him quite a lot of money I am guessing, and he wouldn’t even notice the difference for what he needs it for.

But what if he needs a bit more juice and power every once in a while? Can renting that in the cloud make a difference?

What about the need to use specialized software that is hard to install and configure? Or that requires a lot of collaboration on large amounts of data that need to be shared across the collaborators?

Taking the notion and capabilities of cloud gaming and applying them to non-gaming use cases can help us with multiple other requirements:

  1. CPU and memory requirements that can’t be met with a local machine easily
  2. The need to maintain privacy and corporate data in work from home environments
  3. Zero install environment, lowering maintenance costs

Do these have to happen with WebRTC? No

Can they happen with WebRTC? Yes

Would changing from proprietary VDI environments to open standard WebRTC in browsers improve things? Probably

Why use WebRTC in cloud gaming

Why even use WebRTC for cloud gaming or more general cloud rendering then?

With cloud gaming, we’re fine doing it from inside a dedicated app. So WebRTC isn’t really necessary. Or is it?

In one of our recent WebRTC Insights issues we’ve highlighted that Amazon Luna is dropping the dedicated apps in favor of the web (=WebRTC). From that article:

“We saw customers were spending significantly more time playing games on Luna using their web browsers than on native PC and Mac apps. When we see customers love something, we double down. We optimized the web browser experience with the full features and capabilities offered in Luna’s native desktop apps so customers now have the same exact Luna experience when using Luna on their web browsers.”

Browsers are still a popular enough alternative for many users. Are these your users too?

If you need or want web browser access for a cloud gaming / cloud rendering application, then WebRTC is the way to go. It is a slightly different opinion than the one I had with the future of live streaming, where I stated the opposite:

“The reason WebRTC is used at the moment is because it was the only game in town. Soon that will change with the adoption of solutions based on WebTransport+WebCodecs+WebAssembly where an alternative to WebRTC for live streaming in browsers will introduce itself.”

Why the difference? It is all about the latency we are willing to accommodate:

Your mileage may vary when it comes to the specific latency you’re aiming for, but in general – live streaming can live with slightly higher latency than our online meetings. So something other than WebRTC can cater for that better – we can fine tune and tweak it more.

Cloud gaming needs even lower latency than WebRTC. And WebRTC can accommodate for that. Using something else that is unproven yet (and suffers from performance and latency issues a bit at the moment) is the wrong approach. At least today.

Enter our WebRTC Protocols courses

Got a use case where you need to render remote machines using WebRTC? These require sitting at the cutting edge of WebRTC, or more accurately and a slightly skewed angle versus what the general population does with WebRTC (including Google).

Taking upon yourself such a use case means you’ll need to rely more heavily on your own expertise and understanding of WebRTC.

Over a year ago I launched with Philipp Hancke the Low-level WebRTC Protocols course. We’re now recording our next course – Higher-level WebRTC Protocols

Oh, and I’d like to thank Midjourney for releasing version 5.2 – awesome images

The post Cloud gaming, virtual desktops and WebRTC appeared first on BlogGeek.me.

]]>
https://bloggeek.me/cloud-gaming-virtual-desktops-and-webrtc/feed/ 0