OpenAI, LLMs, WebRTC, voice bots and Programmable Video
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreA look at WebRTC trends 2022 and what is in store in the upcoming year, especially now, as the market is heating up and differentiation and proprietary are 🔥 again.
👉 If you are interested, then check out the updated WebRTC predictions for 2023
👉 Or go for the WebRTC predictions and trends for 2023
We started this year with my WebRTC trends for 2021, so it is time to conclude the year (stating that I was generally spot on), and look at what 2022 is bringing us. In many ways, 2022 is a continuation of what we had in 2021 with some interesting nuances.
My main worry is that a war is brewing. On one hand, Google is leading WebRTC, but probably not seeing enough value out of it as a big corporation. On the other hand, much of the rest of the industry is frustrated at what is taking place with the main WebRTC library - libwebrtc - that is maintained, controlled and owned by Google. This is leading to many different forks along with discussions and attempts to find a better structural solution to this big initiative called WebRTC.
A lot of this is trickled throughout the year as part of the WebRTC Insights service that I am running along with Philipp Hancke.
I can ramble on in this overview, but it is best to just… start running with it.
Two years ago we shifted gears, moving from the Growth era in WebRTC to the WebRTC Differentiation era. I discussed that at length earlier this year, when I explained how WebRTC differentiation manifests itself.
It started with Google splitting up their WebRTC development efforts, making decisions on what to place in libwebrtc, their open source implementation of WebRTC, and what to implement outside of it. The verdict came in a way that any machine learning algorithm that can be kept outside of WebRTC - will be.
Other large vendors understandably followed suit.
Have we reached peak WebRTC?
Philipp made me aware of the Chrome Platform Status website and the many statistics you can find there. It makes it possible to track how many page loads include certain API calls, with many of these relating to WebRTC. The one I selected for the diagram above is that of GetUserMediaPromise, showing how often do web pages that are loaded in Chrome ask permission to access a camera or a microphone - leading more often than not to a WebRTC session.
We’ve seen a huge increase in use of WebRTC throughout the pandemic, and now things seem to be settling down for the last half year on ~x4 times what they were prior to the pandemic. Will this last or not is a good question. Clubhouse seems to have plateaued since its strong debut for example.
No one really knows what the next 12 months are going to look like, and if Omicron or yet another strain of the virus will push us back to the safety of our homes and quarantine - or what things will look like when we find ourselves on the other end of this pandemic.
Google has a stranglehold on WebRTC - for better and for worse.
ALL web browsers today that support WebRTC do so via libwebrtc, which is Google’s implementation of WebRTC:
Google seems to have shifted to a kind of a maintenance mode with WebRTC. They have also changed their mindset and are focusing with libwebrtc on what’s good for Google. It all makes sense. For them...
After 10+ years of holding up the mantle for the whole industry, it is becoming tiresome, especially when there’s not enough to show for it internally. The shift was inevitable.
Google is doing what is good for Google with WebRTC
That means that if your use case falls within the realm of what Google does and needs, then you’re in good shape and good luck. And if you aren’t… well…
In the meantime, the industry around WebRTC has good meaning people. Those who want to see WebRTC grow, flourish and thrive. They are trying to help, but helping is HARD:
A deadlock.
There has been a lot of open source built around WebRTC and in the recent two years that has accelerated as well - the pandemic and all.
What we’ve seen in these 10 years are a few distinct open source projects that have broken out from the pack, making themselves more popular than others. I know the list here is lacking and others are used as well - but assume that these are the ones I see the most in the market when it comes to open source (I am intentionally ignoring the VoIP/SIP open source projects such as FreeSwitch and Asterisk here).
The illustration above shows my current thinking about the trends surrounding these top open source WebRTC technologies:
Then there’s Electron. A PC application framework built on top of the Chromium browser engine - Electron is popular with WebRTC apps as well.
Electron is a great starting point: you write your web app. Wrap it with Electron. And you’re done.
But in many ways, that’s just the beginning of your journey. Arnaud Budkiewicz of RingCentral spoke at the recent Kranky Geek about their journey:
Using Electron means surrendering to the Chromium+libwebrtc release cadence that Electron has opted for - or digging deeper and owning that technology stack as well.
Using CPaaS WebRTC solutions was never easy, and in 2022 it is going to be even more complicated. Why? Because the landscape is unclear.
Twilio is chasing CEP butterflies. I am all for it - though sadly it has nothing to do with WebRTC.
They have been slow to respond to the market changes when it comes to WebRTC, and it still feels like WebRTC is an afterthought to them.
Agora’s stock has been acting out after their successful IPO.
While their performance and traffic is going strong, there are market uncertainties there - peak WebRTC is one, and the huge spike is Clubhouse (using Agora). The Chinese government regulation is another. I am singling out Agora here because they are the only CPaaS vendor focused on RTC that is a public company.
On the positive side, we’ve seen the investment in Daily - $40M in series B.
The company is growing, focused on their WebRTC implementation for developers.
Vonage just got acquired by Ericsson. That leads us to this acquisitions chain when it comes to their WebRTC CPaaS capabilities:
TokBox → Telefonica → Vonage → Nexmo → Ericsson
We will see where this takes the Vonage API platform.
We still have newcomers to this market. Big and small. We’ve seen Microsoft and Amazon jump into CPaaS - and especially to where WebRTC is being used in CPaaS. Zoom is dabbling with an API for CPaaS lately as well.
But also newer players such as 100ms with an interesting concept to their APIs, enabling developers to offer hints of their use case, or doing more in the background for the developers than the “classic” vendor solutions.
The market is also growing and maturing in CPaaS. We’re starting to see higher level abstractions, offering the UI/UX along with the APIs themselves. These come in different shapes, sizes and names, but they are all geared towards making the lives of developers easier.
Which one should you be using?
Will the one you choose be there next year?
Is he going to shift focus and bail on you?
Are the APIs and capabilities he is offering actually going to work?
Lots of questions. No easy answers.
After this long preamble, it is time to talk about the WebRTC trends in 2022.
The 5 biggest trends for WebRTC in 2022 are taking slightly different routes than we’ve seen before. Some focus on scale while others on new requirements and others still on new markets.
There’s a saying/quote in Hebrew - “you start as fast as you can, and then you continue to accelerate slowly”. This is where we’re at with WebRTC.
This is obvious, and a continuation to 2021. Scale still matters. A lot. This is going to stay strong as an initiative well into 2022.
In our Kranky Geek event of November 2021, Google shared the work they’ve done in the past year. Below is the slide presented around performance optimizations. As you can see, this is an ongoing effort with multiple tasks. A lot of this has been achieved, but more is being done.
These improvements are aiming towards better scalability of a single session for multiple participants. The many bugs we now track in the recent couple of months around hardware encoding and decoding as part of the WebRTC Insights shows that this will continue well into 2022.
At the same time, we are seeing investments being made by many on the infrastructure level to scale their services.
What was the case in 2021 will be in 2022 as well.
There are a swath of new technologies that are just now starting to mature. They are enabling vendors to do more with WebRTC. At Kranky Geek, for example, we’ve spent considerable time with these technologies and seeing how various vendors are making initial use of them.
Interestingly, some of these technologies are already powering features that are considered table stakes in video meetings.
Probably the crown jewel of enablers in the web today.
WebAssembly speeds up performance of web code AND enables cross language compilation. For WebRTC, the main benefit here is the use of WebAssembly for machine learning tasks used for media manipulation. From noise suppression, through background replacement and funny hats, to video lighting. All these are enabled with WebAssembly today.
Expect more vendors to use this and expect more features to be enabled by this.
Not happy with WebRTC? There’s WebTransport & WebCodecs.
Together, they theoretically enable you to encode and decode media and send or receive it from a server.
The devil here is in the details, and while not favorable yet to replacing WebRTC, they do look promising. We’ve had Dolby and Intel share some of their insights on these at Kranky Geek.
What we are going to see is more vendors experimenting with these technologies as well as using them alongside and with WebRTC where it makes sense. I’ve pointed to this approach over a year ago, as part of the WebRTC unbundling process taking place.
With Google’s own enthusiasm about these, one wonders if they will lose interest in WebRTC a few years down the road.
Then there are new codecs.
AV1 has been around since 2018. Not exactly... obviously… some people have been pushing it as a solution for WebRTC since 2018. The truth of it is that at the end of 2021, AV1 is yet to be seen anywhere significant when it comes to WebRTC. Not because it isn’t good, but because it takes time to release a new codec to market - especially a video one.
Well, the wait is somewhat over. AV1 is coming to WebRTC and we will see use of it in 2022. It will still be limited, but it will finally be interesting and relevant.
A new ML-based voice codec (think Lyra) will take a wee bit longer. There’s no consensus yet as to which voice codec it should be. AV1 didn’t have that problem - we already knew AV1 would be next in line.
How you design and deploy WebRTC is changing. The usual mesh/mix/route alternatives are still there. Many go for hybrid approaches. Focus and discussions lately went to the hardware itself, and where it is located, and how packets are routed exactly.
Agora were probably the first to do this openly and at scale, marketing it as a better approach. In 2021 we’ve seen the likes of Subspace and Cloudflare announce managed TURN services with regional distributions of 100 or more data centers.
I’ve marked infrastructure as one of the challenges in my workshop in 2021. In 2022 this is going to become an even more interesting topic. Anycast is going to join the frey as a technology used by vendors.
What we still won’t have as a definitive answer in 2022 is which one is preferable? Is there a real value differentiator and observable improvement in quality when using more than 10 regions globally. Would it be worth the effort, especially with the large cloud vendors popping out new data centers every month or so?
Moving away from features and technologies to use cases.
Live streaming is here and WebRTC is how you do it.
There are other technologies, but none that works as fast as WebRTC and works in browsers.
People are getting more and more comfortable with video. Due to the pandemic, a lot of new ways of communicating at scale are here, done remotely. And people want to interact. Live. and in real time.
2 seconds latency might be nice, but sub-second is nicer.
What we will be seeing is more vendors turning towards WebRTC for that sub-second experience. There’s room for higher latencies - for many use cases. But when it comes to instantaneous, expect to see a lot more WebRTC. At least until WebTransport & WebCodecs mature enough 😀
Zoom fatigue? Boring gallery view and tiles?
Everyone is trying to rethink the communications of the future, and they don’t look like the talking heads we’ve grown up on in the last 20+ years.
The two extremes I am seeing?
We will see more of this in 2022. At the moment, there are so many different experiences being published that the most interesting thing to see will be which ones will stick and which will fade away.
As we head into 2022, it is also important to understand who are the main players and the main market forces. These are going to shape WebRTC moving forward.
The biggest tech vendors are the ones setting the pace and calling the shots with WebRTC. Each with his own angle to it.
You can add to this list Intel, who are now pushing the envelope on hardware encoding for WebRTC, something that was usually ignored by hardware vendors.
In 2022, these will be the shapers of WebRTC as we know it. They will decide if they listen to external feedback and pour it into their own product roadmaps or not - and that will end up affecting us all in the WebRTC ecosystem.
As I stated earlier, Twilio doesn’t really care about WebRTC. Not much anyway. WebRTC isn’t big money for Twilio, so they are focusing elsewhere. 👉 We do make use of Twilio’s video-js repo as a good source of bug reports (Twilio and Vonage are still ahead of most everyone else in that).
As the dominant CPaaS vendor that is a proxy for other vendors:
This isn’t the best of environments for those who want to use CPaaS, and to some extent, this isn’t productive for those who want CPaaS either.
It also dilutes the power that CPaaS vendors have (or want to have?) over the direction WebRTC is headed. It would have been great to have these vendors’ voices heard more, as they aggregate behind them thousands of companies, use cases and requirements. Part of it is why I think UCaaS is outpacing CPaaS in innovation.
Zoom doesn’t really use WebRTC, but it does affect everything there is around WebRTC:
Without being a part of the WebRTC ecosystem, Zoom is a big shaper of the WebRTC market.
Coopetition exists everywhere. The notion of competitors cooperating together is something we see a lot, especially in standardization organizations, where vendors are chugging it down, trying to get to an agreeable, better place for everyone (=lowest common denominator). We’ve seen it with the decision on mandatory to implement video codecs in WebRTC for example.
What we’re now seeing more is collaboration between companies directly - ones that compete in some ways and cooperate in others.
Microsoft improving screen sharing in Google’s libwebrtc (after deciding to adopt Chromium for Edge), Intel helping with hardware encoding of AV1, RingCentral and 8x8 pushing to get RED for Opus into libwebrtc, …, the list goes on.
We’ve come to a point where it is acknowledged that we can’t just sit and wait for things to “happen” on their own with WebRTC on the implementation side and there needs to be more proactivity and cooperation. Vendors need to start investing more and publicly in the baseline open source implementation and not only in their proprietary code.
This is wishful thinking most of the time, but I think we’re at an inflection point where this will need to happen more for the WebRTC community and ecosystem to take the next step in its evolution.
In January I’ll be conducting a workshop that covers these topics. The trends and what to do with them. It will offer actionable advice on what you should do in 2022 and it will be interactive in nature.
My WebRTC trends in 2021 workshop was well attended. Here is what Stefan Karapetkov of Twilio had to say about it:
I was looking for an update on the WebRTC market and technology trends, and the workshop provided exactly that.
The information was specific, very well organized, and delivered in an engaging and entertaining way.
The workshop was split into three sessions and gave me enough time to think about the material, do additional research, and prepare questions for the next session.
I left the workshop with a solid understanding of the WebRTC technology, even more importantly, of the many technology tradeoffs that the WebRTC community made along the way.
I use this knowledge in my everyday interactions with colleagues and customers, and think that the workshop would beneficial for anyone in a Video Product Management or Architecture role, even for Solution Engineers who specialize in Video.
The 2022 workshop is going to be just as structured and useful, with ample interactivity that will give you the opportunity to interrupt and ask questions relevant to you and your business.
This new workshop, WebRTC trends for 2022, will take place during January-February, in 3 consecutive sessions of 2 hours each.
Space is limited, so if you are interested, register sooner rather than later.
See you at the workshop.
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreGet your copy of my ebook on the top 7 video quality metrics and KPIs in WebRTC (below).
Read More