Media processing Archives • BlogGeek.me

Probing

Tsahi Levent-Levi — Sun, 16 Jul 2023 06:22:35 +0000

In WebRTC, probing is a mechanism used to check if more bandwidth is available.

Probing is one of the techniques used for bandwidth estimation.

Traditional bandwidth estimation mechanisms rely on packet loss and inter-frame delay. When using these techniques for increasing bandwidth, the level of confidence as to how much the bandwidth estimation can grow is low, leading to slow increase in bandwidth estimation even when more capacity is available.

Probing, on the other hand, is used to try and aggressively increase bandwidth estimation while keeping the estimate accurate. It is done by sending dedicated probing packets that can be disregarded without any degradation to media quality.

WebRTC uses this mechanism at the beginning of a session in order to reach bitrates of 2mbps or more within a second or two of starting the session. It also uses this technique throughout the session as needed.

The post Probing appeared first on BlogGeek.me.

FID (Flow Identification)

Tsahi Levent-Levi — Sun, 25 Jun 2023 19:05:07 +0000

FID denotes the Flow Identification which is used in the SDP for describing the relationship between the primary media SSRC and the retransmissions SSRC.

You can find the definition and example for it in RFC 4888 section 8.7.

The post FID (Flow Identification) appeared first on BlogGeek.me.

RRID (Repaired RTP Stream ID)

Tsahi Levent-Levi — Sat, 24 Jun 2023 15:24:14 +0000

RRID denotes the Repaired RTP Stream ID in RTP.

It is defined and used as a header extension in RFC 8852 section 3.2.

RRID, along with MID and RID, are used to associate between low-level RTP concepts like synchronization source (SSRC) and higher-level WebRTC objects such as RtpSender and RtpReceiver.

These identifiers are important to be able to demultiplex RTP bundling, a common mechanism used in WebRTC sessions.

RRID is used specifically to indicate retransmitted packets.

When RTP packets are received, WebRTC needs to decide to which object/stream to associate the incoming packet. This is done using roughly the following decision diagram:

All header extensions used by MID, RID and RRID have the same basic structure and contain an ASCII string with the MID/RID/RRID value in it.

The post RRID (Repaired RTP Stream ID) appeared first on BlogGeek.me.

RID (RTP Stream ID)

Tsahi Levent-Levi — Sat, 24 Jun 2023 15:23:05 +0000

RID denotes the RTP Stream ID in RTP.

It is defined and used as a header extension in RFC 8852 section 3.1.

RID, along with MID and RRID, are used to associate between low-level RTP concepts like synchronization source (SSRC) and higher-level WebRTC objects such as RtpSender and RtpReceiver.

These identifiers are important to be able to demultiplex RTP bundling, a common mechanism used in WebRTC sessions.

RID is used specifically to distinguish between different simulcast streams of the same video source.

When RTP packets are received, WebRTC needs to decide to which object/stream to associate the incoming packet. This is done using roughly the following decision diagram:

All header extensions used by MID, RID and RRID have the same basic structure and contain an ASCII string with the MID/RID/RRID value in it.

The post RID (RTP Stream ID) appeared first on BlogGeek.me.

MID (Media Identification)

Tsahi Levent-Levi — Sat, 24 Jun 2023 15:20:46 +0000

MID denotes the media identification tag in RTP.

It is defined and used as a header extension in RFC 8843 section 16.2.

MID, along with RID and RRID, are used to associate between low-level RTP concepts like synchronization source (SSRC) and higher-level WebRTC objects such as RtpSender and RtpReceiver.

These identifiers are important to be able to demultiplex RTP bundling, a common mechanism used in WebRTC sessions.

When RTP packets are received, WebRTC needs to decide to which object/stream to associate the incoming packet. This is done using roughly the following decision diagram:

All header extensions used by MID, RID and RRID have the same basic structure and contain an ASCII string with the MID/RID/RRID value in it.

The post MID (Media Identification) appeared first on BlogGeek.me.

Insertable Streams

Tsahi Levent-Levi — Mon, 03 Apr 2023 06:28:39 +0000

Insertable Streams is a mechanism in WebRTC that enables application access to the actual media right before it is being sent or received over the network.

There are two types of Insertable Streams available in WebRTC:

WebRTC Encoded Transform (W3C draft) – providing access to encoded media, which is the output of the encoder part of a codec and the input to the decoder part of a codec which allows the user agent to apply encryption locally
MediaStreamTrack API for Insertable Streams of Media (W3C draft), also known as “breakout box” – providing direct access to the stream, prior to its encoding or decoding, thus allowing it to be manipulated which allows features like background blur

Such APIs were added to WebRTC in an effort to enable the support of more use cases and scenarios that were hard or impossible to implement without them.

The encoded transform alternative, which provides access to the encoded media, enables developers to add their own application-specific encryption scheme, offering end-to-end encryption.

The MediaStreamTrack API alternative, offers access to the raw media prior to encoding it. This enables manipulations such as background blur or adding application specific contextual data.

The post Insertable Streams appeared first on BlogGeek.me.

E2EE (End-to-End Encryption)

Tsahi Levent-Levi — Mon, 03 Apr 2023 05:04:55 +0000

E2EE stands for End-To-End Encryption.

In WebRTC, encryption is mandatory and is conducted hop-by-hop. This means that between one WebRTC client to another, the media is encrypted.

In practicality:

TURN servers can’t look at the media, as they aren’t privy to the encryption keys used
Media servers such as an SFU or an MCU can look at the media, since they are considered another WebRTC client to the client sharing its media with them

In order to guard the media and encrypt it from media servers, E2EE technologies can be used.

For WebRTC, this is possible by using Insertable Streams. This technology allows the application to catch the media packets just before they are being encoded on the sender side and just before they are being decoded on the receiver side:

The application at this point can implement a callback that will encrypt or decrypt the data outside of the scope and context of WebRTC, making it indiscernible to media servers and anyone else who has no access to the encryption keys used.

The keys themselves are negotiated outside of the scope of WebRTC.

With E2EE, each packet gets encrypted twice:

First time using the application level encryption known only to the clients
Second time using SRTP in the standard mechanism WebRTC uses, making it readable by media servers]

This separation of keys enables an organizational separation where one organization may know the E2EE key – the client side application level secret, and the other organization who is hosting and running the SFUs does not.

E2EE using Insertable Streams is applicable to SFUs and cannot be used in MCUs.

The post E2EE (End-to-End Encryption) appeared first on BlogGeek.me.

BWA (Bandwidth Allocation)

Tsahi Levent-Levi — Mon, 03 Apr 2023 04:19:05 +0000

BWA stands for Bandwidth Allocation.

BWA is an important aspect when dealing with large group calls, where participants may receive more than a single incoming media stream. In such cases, the decision of what percentage of the estimated bandwidth to allocate per incoming media stream becomes important.

Since each participant is limited by the performance of the device, the display resolution and the download speeds, there is likely to be a need to curb the amount of data being sent. Reducing that amount needs to be done based on the priorities of the given scenario, which is what BWA is meant to achieve. Such priorities can be defined by dominant speaker identification, displayed resolution, phases of the moon or any other heuristic.

BWA will usually get its upper bound of total bitrate from a BWE (bandwidth estimation) algorithm.

The post BWA (Bandwidth Allocation) appeared first on BlogGeek.me.

Packet pacing

li.anton13@gmail.com — Fri, 11 Mar 2022 18:36:32 +0000

Transmitting many UDP packets that form a video frame in a burst at the same time increases the chances of packet loss and increases Jitter. For that reason, a “pacer” is typically used to spread out the sending out a bit (but no longer than the frame duration).

See also https://datatracker.ietf.org/doc/html/rfc8298#section-4.1.2.6 for more information.

QUIC stands for Quick UDP Internet Connections.

QUIC is an experimental protocol by Google that is based on UDP and targeted at improving situations when you need multiple parallel sessions between two entities. This is the situation for virtually every web page on the internet, which usually requires more than a single resource file to be transmitted from the web server to the browser.

QUIC finds WebRTC in Google’s latest roadmap where it announced its intentions to experiment with the use of QUIC as a replacement for SCTP for the data channel transport.

Bandwidth is the capacity available to receive and send data over a certain network connection.

There are a few important concepts around bandwidth:

Available bandwidth fluctuates dynamically over time
Sending (outgoing) and receiving (incoming) bandwidth are asymmetric in nature
In VoIP and WebRTC, our purpose is to estimate the available bandwidth in as much accuracy as possible – the better the estimate, the better the media quality are be able to provide
The estimation we make derives the maximum bitrate that we can send or receive

The post Packet pacing appeared first on BlogGeek.me.

P2P4121

li.anton13@gmail.com — Sun, 20 Feb 2022 07:10:33 +0000

P2P4121 stands for “Peer-to-peer for 1 to 1”.

The term was first introduced by the Jitsi team for a feature they added which enabled the use of peer-to-peer communications that gets upgraded to a group SFU session if the number of participants in the meeting increases.

This enables using a lot less media server resources and reducing the cost of delivering a communication service when the number of participants in a session is low (i.e – 2), and when people join, the session can get rerouted to a media server automatically and smoothly with little or no interruption to the users.

There are multiple ways to implement such a feature, as well as enhancing it to larger groups, such as:

Being able to downgrade back to P2P from an SFU call when the number of participants shrinks back
Keeping 3-way calls on a P2P mesh architecture and switching to an SFU only from 4 participants or more
Maintaining SFU ports or allocating them only when needed
Etc

The post P2P4121 appeared first on BlogGeek.me.