OpenAI, LLMs, WebRTC, voice bots and Programmable Video
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreHere are the WebRTC predictions and trends you should expect in 2023. It is more of the same, but with nuanced differences.
Looking for 2024 predictions? Read here: My WebRTC predictions for 2024
As we’re starting 2023, it is time to look back and then into the future, to understand where we are and where we are headed with WebRTC. This year, things are getting somewhat trickier here:
Oh, and did I mention that I changed a lot in my own work-life? I am now Chief Product Officer at Spearline, dealing with the larger picture of testing and monitoring communication networks. Life is full of surprises 🥸
There’s lots to cover, so let’s start.
Before I dive into the predictions, it is important to know where we stand. We’ll do this by looking at 3 different layers:
Let’s start with the technology itself ➡️
We are well into the era of differentiation:
This started with Google unbundling WebRTC in the browser, starting to offer pieces of it as separate future W3C standards as well as opening up more access to lower levels of the stack. In the past year we’ve seen growing use of these capabilities outside of Google and experimentation and in production.
2021 brought with it background blurring and replacement in the browser to the masses.
In 2022 we’ve seen proprietary codecs and noise suppression finding a solid home in WebRTC applications and technologies using these capabilities. Representative commercial examples of this are Dolby Voice proprietary codec and Twilio’s Krisp partnership on noise cancellation.
If this is hinting on anything, it is that we’re going to see more of these moving forward, as vendors try to differentiate further. The only thing slowing this trend down is the current market recession.
The pandemic that has raised all boats is all but over.
China is opening up, with or without another COVID wave. Many have shifted to hybrid work. Others are now communicating via video sessions a lot more than they used to.
Zoom is seen as the poster child of the pandemic. If you overlay its stock price with WebRTC usage in Chrome, you get this interesting chart:
WebRTC is still 3-4 times bigger in use than it used to be prior to the pandemic. That said, throughout 2022 we’ve seen consistent decrease in use of WebRTC. This is likely to continue into 2023.
My guess/prediction is that we will stay at around 3 times the use we had at the beginning of 2020.
libWebRTC is still king of the hill when it comes to WebRTC client-side implementations.
Nothing comes close to it.
libWebRTC is Google’s implementation of WebRTC, and the one used across all browsers today. A monoculture.
For most projects, using libWebRTC as a starting point for a non-browser implementation is the way to go. In some niche use cases, other solutions can and should be considered. The main alternative in such cases is probably Pion today.
2022 has been mostly a year of optimizations and polishing for the libWebRTC implementation, continuing on Google’s focus in 2021. 2023 will look no different.
👉 WebRTC Insights clients received an analysis of the contributors to the libWebRTC project throughout history as part of a recent issue tracker sent to them.
Lets try a quick Q&A here on libWebRTC:
Is there a competitive alternative to libWebRTC in WebRTC?
The most popular WebRTC implementation out there is libWebRTC.
It is also the most dominant since it got embedded in all modern browsers.
libWebRTC is well maintained and is undergoing consistent improvements and optimizations. No other WebRTC stack is getting the same level of investment.
This is not expected to change in the foreseeable future.
Why is Google investing in libWebRTC?
This isn’t about Google Meet. Google is monetizing the web via ads delivered on search conducted in browsers and smartphones. By placing more of our activities in browsers and on the web, Google can monetize more interactions - indirectly.
Then there’s Google Meet/Workspace, competing with Microsoft Office on enterprise productivity.
Commoditizing communications is Google’s way of managing complementary technologies. Ben Thomspon in his latest analysis of AI and the Big Five refers to Joel Spolsky’s Strategy Letter V which offers a great explanation for both Google’s approach and is a good segway to our next section on open source:
Open source is not exempt from the laws of gravity or economics. [...] something is still going on which very few people in the open source world really understand: a lot of very large public companies, with responsibilities to maximize shareholder value, are investing a lot of money in supporting open source software, usually by paying large teams of programmers to work on it. And that’s what the principle of complements explains.
Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods. So:
Smart companies try to commoditize their products’ complements.
Not much has changed since my analysis a year ago on WebRTC trends in 2022, where I looked at WebRTC open source projects.
Unsurprisingly, Janus, Jitsi, mediasoup and Pion still reserve most of their founders and key figures. These are teams/individuals who are personally and emotionally invested in these projects, which is a good thing.
The challenge is that besides Janus, none of them offer any official support and custom development. For the rest, companies need to rely on in-house development or external outsourcing vendors and freelancers.
As this state hasn’t changed for a good few years, not much is expected to change in 2023.
The main difference or question mark can be put on the projects that are now indirectly owned by a business whose focus might be elsewhere:
The CPaaS landscape is changing and shifting where it comes to WebRTC.
We started seeing these shifts a couple of years ago, but it seems that change is accelerating in this space - something that is different from what is happening with WebRTC open source.
The perceived leaders in WebRTC CPaaS are still Twilio, Vonage and Agora. I have a feeling that by the end of 2023 this will change.
Let’s review the who’s who of WebRTC in CPaaS.
No CPaaS list is complete without Twilio. I’ll obviously start with them.
Twilio is continuing their trend from last year of going after the Customer Experience Platform market.
There was one big change that took place in 2022, where Twilio announced focusing on 4 pillars, instead of spreading all over. This was conveyed in Jeff Lawson’s open letter laying off 11% of their workforce. These focus areas are:
No word about WebRTC. Definitely no video in here.
The opposite has happened - Twilio Live, announced in 2021, is being shut down:
Interestingly, its migration guide is recommending Mux, a vendor that just launched a WebRTC video offering as well. Should Twilio customers using Programmable Video also migrate that part to Mux? One wonders 🤔
Vonage has its hands full with Ericsson who acquired them.
Not much has changed on their platform besides the introduction of background blurring and replacement.
As the honeymoon between Vonage and Ericsson will dissipate, along with the realization of a recession, it will be interesting to see what will happen to the Vonage Video APIs - will the level of investment there remain high or will it shrink?
Agora’s stock tanked since its peak:
Our information there is more limited than that of Zoom simply because the Agora IPO took place only in 2020.
It got into a recent mud fight with Zoom over the quality of experience that their respective platforms offer.
Zoom opted to go with the unbundled approach, using WebRTC only sparsely. For video, they are especially focused on building their own media stack replacing most of what WebRTC does. In the short term, such an approach isn’t too productive. Longer run, who knows?
Zoom and APIs and CPaaS is a long affair by now. One which hasn’t worked out well enough for Zoom. Their browser story wasn’t tight enough until recently. This got them to go head to head with competition and commission a performance report pitting their Zoom Video SDK versus Vonage Video API, Agora, Twilio Programmable Video and Amazon Chime SDK.
This specific post is telling:
IaaS gone video CPaaS. That was in 2020. Both Microsoft Azure and Amazon AWS introduced their own video APIs.
Microsoft had the better story: Azure Communication Services. Uses the same infrastructure as Microsoft Teams. Being able (in the longer run) to connect directly to Microsoft Teams calls.
The network effect and infrastructure were always in their favor. That said, it doesn’t appear enough in discussions I have with developers building WebRTC applications.
There’s a lot of untapped potential here.
I am starting to see the Amazon Chime SDK in more places. It seems that like Amazon Connect, after 3 years of being out there, it is getting the critical mass it needs to become “a thing” in the industry.
This is one to watch closely, especially if you are a video API vendor yourself…
There’s another IaaS vendor who is joining the party of Video APIs - Cloudflare.
Cloudflare started in 2021 with a managed TURN service. One that is still in private beta.
But they announced and launched on September 2023 two additional services:
Both API offerings that are well-defined these days in the Video API or WebRTC CPaaS space.
Hopefully, they’ll move faster with these two than they had with their managed TURN service.
Mux, a vendor who focused on video delivery via APIs has joined the WebRTC market as well, offering their own Video APIs - Mux Real-Time Video. This is an interesting take, especially since their target audience is slightly different than that of developers who end up with CPaaS. It brings a fresh look and interpretation of the problem - just like the IaaS vendors and Zoom are.
The interesting part is that Twilio decided to refer their Twilio Live customers to Mux. If I were Mux, I’d mark every customer coming in from Twilio Live, making sure they get the best experience and support so that 6 months from now I can start talking to them about migrating away from Twilio Programmable Video.
Then there’s the lowcode/nocode trend and how it manifests itself in CPaaS. I’ve written an ebook about it - Lowcode & Nocode in Communication APIs (sponsored by Daily, a known CPaaS vendor). In the past two years we’ve seen more and more CPaaS vendors offering lowcode and nocode solutions on top of their video APIs.
To that specific market/solution, we are seeing SaaS vendors heading as well - for some reason, everyone thinks that CPaaS is a great business.
The notable examples here are Whereby, a meetings platform that started offering Whereby Embedded, and Digital Samba, who started from a webinars platform and is now offering Digital Samba Embedded.
This part of the market will continue to evolve, with CPaaS vendors and others offering ever higher layers of abstraction.
We’re done with the market overview. Time to move on to predictions.
I’ll start by looking at how I fared with my 2022 predictions of the upcoming trends…
This was a hit and miss thing (obviously).
There were three trends that I was spot-on.
#1 - Scale & performance
My bet at the time was that we will continue to see a continuation in improving scale and performance of WebRTC. This was definitely the case for 2022.
At the Kranky Geek event in November 2022, Google in their WebRTC annual update spent the time on quite a few items, but the first one of them was performance optimizations:
We will review this slide a few more times later on.
#2 - #newtech
This is the new technology trend, which was split a bit internally:
#4 - Live streaming
Live streaming continued to evolve in 2022:
This is where I got it wrong.
#3 - WebRTC infrastructure, hyperscaling and SD-WAN
Here, I thought we’ll still ponder if Anycast and SD-WAN are important to WebRTC.
And then Subspace got shut down, and with it, a lot of the effort to push this story forward. It is sad, because I do think that striving to lower latencies and clearer networks is the way to go. This setback will delay such attempts by a few years.
#5 - 2D to Metaverse
Extremes and experiments to counter Zoom fatigue. I don’t think that that many new alternatives and suggestions were made in 2022 that we haven’t seen before.
Cloud media processing
This is something I haven’t seen coming. It can’t be considered a trend yet, but it is something to keep a close eye on.
The whole point of using SFUs in WebRTC is in order to reduce infrastructure costs in compute.
BUT…
Google started with doing noise suppression in the cloud for Google Meet a few years back. This means decoding and encoding audio in the cloud in an SFU architecture.
And now Google is doing the same for background replacement on low-end devices 😮
Is that a one-time transitional thing, or will others follow suit?
Time to look at my predictions for 2023. This is where I think we will see the most focus in WebRTC this year, and how it will shape up.
In libWebRTC we will see more of the same, with a few nuances.
Google’s WebRTC library is mature. It has all the bells and whistles expected of it. Here’s where we will see Google taking libWebRTC:
libWebRTC will maintain its leading and dominant position as the WebRTC stack of choice for client-side development. And Google will take it wherever THEY need it.
WebAssembly will continue to be a driving force in 2023 when it comes to WebRTC.
It will be used for media processing and in relatively the same places we see it used and experimented today - background replacement, noise suppression and proprietary codecs implementations.
We will also see it enabling more vendors to leave the peer connection implementations in WebRTC and play around with media engines developed using WebAssembly and running on top of WebRTC data channels or WebTransport.
This one is a bit of an overreach, but one I am willing to make.
Lyra, Google’s ML-based voice codec, will find its way into WebRTC before AV1 will. This isn’t in terms of availability, but in terms of adoption and popularity of use.
AV1 takes up too much CPU power and memory. This makes it usable only in high-end devices or devices with newer hardware (which is almost non-existent still). We have ways to go until AV1 can become a reality. Probably one or two more years.
Lyra is here. And it is improving in performance and quality. Microsoft’s Satin is breathing down Google’s neck. Something will have to happen here. And my bet is that this will happen in 2023.
The technology is most probably ready. The market is ready.
You can learn more about it from Phillip Hancke’s session about voice codecs in WebRTC at the recent Kranky Geek event.
You can say I am biased. So be it.
Observability was always a real challenge with WebRTC applications. Its nature, due to many reasons (one of them being encryption), makes it hard to monitor using legacy tools and methodologies.
What we will see in 2023 is more interest in observability. We have more products in the market that use WebRTC. Contact centers are moving to the cloud. Many of the bigger vendors are in the process of shifting focus from SIP to WebRTC in their current deployments, and not just as a feature in their checklist.
This will bring with it the need for better tools to understand and figure out how WebRTC sessions behave - both in pre-production and in production.
And now it is time for some shameless self-promotion here -
Watch my session from Kranky Geek, where I discuss on where observability of WebRTC statistics fall short (hint: troubleshooting)
👉 Don’t forget to check out the WebRTC products we have at Spearline
This is an easy one to make in 2023.
We’re in recession. It will get better by December. It will get worse and stay with us. Whoever is correct in his estimate at what will happen a year from now, one thing is quite apparent:
Companies are closing their pockets, downsizing and keeping to their core focus.
WebRTC is part of it, and as a relatively new technology, it might be hurt more than others. I don’t think this will be the case, simply because we’re also in transition towards hybrid work due to the pandemic we faced. These two will negate each other a bit.
The end though will be house cleaning of the industry itself:
This in itself puts a strain on developers who need to choose which CPaaS vendor to use - picking the wrong one may lead them stranded with the need to switch (think Twilio Live). They will go to the bigger, more known vendors. Which will lead to a vicious cycle since the smaller vendors may not have the time to grow quickly enough - potential customers will be less willing to risk using them.
Interesting times ahead.
2023 will shape up to be challenging.
On one hand, we have more of the same in a lot of areas. On the other hand, the current market state is causing a lot of instabilities that will cause some shifts in the market.
And that, without saying a word about generative AI and what that might mean to the market of WebRTC and communications moving forward.
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreGet your copy of my ebook on the top 7 video quality metrics and KPIs in WebRTC (below).
Read More