OpenAI, LLMs, WebRTC, voice bots and Programmable Video
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreTime for another controversial post.
You may have seen the comments in my WebRTC in WebKit post. Some of them refer to set top boxes using WebKit as a good reason for pushing WebRTC into them. I am not so sure it makes sense.
Let me tell you a story that is abount 10 years old. From the time I was young and stupid (now I am just stupid).
At the time, I worked as a product manager. One of the products I managed was all about a client SDK that supported multimedia - voice and video calling. It had H.323 and SIP signaling protocols, along with its own souped up media engine (we used whatever we found to make it work - each customer being a different project).
As any product manager, I tried enhancing the scope of the product - making it fit more verticals. The set top box one seemed interesting and a no-brainer. It was also timely. Intel and TI both started pursuing us for adding our client to their chipset as reference code of sorts.
Then we went running after customers. Me, the young and clueless product manager with the sales team. Most of it took place in Asia. I came with the reference designs that used either TI or Intel. Customer after customer, the feedback was that we were using the wrong processors. TI and Intel are great companies, but no one was using them for set top boxes at the time. You had to use Broadcom or some other chipset vendor to be relevant at all.
The joy. I had two large chipset manufacturers working with me not because I had a superior offering but because they wanted to penetrate a market and needed a differentiating feature. Adding something that requires video encoding was their choice as the real set top box features couldn't do that (they still can't). So why not show off something no one else can do?
Why not indeed.
There are many reasons why a video chat in a set top box never made it to our homes. Here are the ones I had to deal with in the two years I was running around with clunky reference designs and talking to customer after customer.
The set top box is a commodity. The most important aspect of it is BOM - Bill of Material - the price of the components you place in it. If they are expensive - it will never be sold.
At the time, set top boxes were usually sold/owned by the cable company. It was the one buying these boxes in bulk and effectively giving it to customers under long term subscriptions. It was a cost center and not a profit center.
Adding anything into it meant increasing the cost, and for that, there needed to be a very good reason.
The chipsets of the set top box is optimized for video decoding using codecs used by TV channels (MPEG-2 and H.264). It is also good at putting some kind of an image overlay on a running video (that would be the menu). Other than that, it was quite stupid.
Adding video encoding into it for the purpose of running a video call was rather expensive to do.
Dealing with a scenario of watching video while engaging in a video call at the same time makes that chip even more expensive.
So first, you had to explain to customers why the hell do you want them to change their whole hardware design and basic chipset they are using. Not easy.
Where do you place the camera?
Do you purchase it as an external peripheral and install it on your own (sure...). Or do you get it as part of the set top box itself?
How far from the TV are you sitting? How are you going to set the angle of the camera?
What about lighting? When I watch TV, the lights are usually out. Should I now go and switch them on just to reach out to the remote and try to call someone?
How do you troubleshoot it? Who is offering the service? The set top box manufacturer or the cable company? A third party maybe?
A video call is a lean forward experience while watching TV is a lean back experience. The user experience is all different. The setting is different.
On a video call, I need to mind how I look. Am I in my pajamas? Easy to get a way with it on a laptop, but harder on a TV screen. In my living room, will the camera capture all of my living room space? Along with the wife passing nearby?
Where do you capture voice? On the remote? In the set top box itself? Now handle echo cancellation on two separate devices, both trying to be as cheap as possible.
There is no incentive for a cable company to offer this capability. Maybe a little, but not much.
They are in the business of entertainment. They sell content. You don't pay for your Skype video calls, so why would you be willing to pay to the cable company for their video calls?
Comcast might just pull it off. They recently added WebRTC into their set top box. Would that lead them towards offering a video chat service? Maybe. But this is going to be challenging as hell, so they will start questioning the value of it pretty fast.
There this thing called Skype on TV. I haven't followed it in recent years, but in the past, it used a specialized camera that had an H.264 video encoder in the camera.
The integration with such cameras is somewhat painful, but it gets the job done and deals with some of the hardware issues mentioned above.
It usually comes as part of the TV itself. I am not sure how popular it is and how much use does this gets.
Android TV and Apple TV are different.
They are designed for apps to some extent, so adding video calling as a third party should be possible.
To my best of knowledge, neither Apple nor Google added FaceTime or Hangouts to their TV offerings. Which speaks volumes.
To top it off, Android TV is based on... Android. So no WebKit needed.
Apple TV is based on... Apple. Which gets us back to being dependent on Apple for WebKit.
Let's try to wrap things up here.
Technically speaking, we can get WebRTC to work on a TV. I've been doing something similar a decade ago.
The challenges though aren't technical ones - they are market and user experience ones. Both haven't changed in the last decade, which makes me cautious about this particular use case.
It is also why I am questioning the whole "WebRTC on WebKit" attempt, and the rationalization that this is good for set top boxes.
Want to put WebRTC on a set top box? Build it on top of an Android and be done with it.
Not every use case makes sense for video calling, and WebRTC won't change that fact.
When you decide on your use case, you need to check its validity from many aspects - not only business ones and not only technical ones.
Learn about WebRTC LLM and its applications. Discover how this technology can improve real-time communication using conversational AI.
Read MoreGet your copy of my ebook on the top 7 video quality metrics and KPIs in WebRTC (below).
Read More