E2EE stands for End-To-End Encryption.
In WebRTC, encryption is mandatory and is conducted hop-by-hop. This means that between one WebRTC client to another, the media is encrypted.
In practicality:
- TURN servers can’t look at the media, as they aren’t privy to the encryption keys used
- Media servers such as an SFU or an MCU can look at the media, since they are considered another WebRTC client to the client sharing its media with them
In order to guard the media and encrypt it from media servers, E2EE technologies can be used.
For WebRTC, this is possible by using Insertable Streams. This technology allows the application to catch the media packets just before they are being encoded on the sender side and just before they are being decoded on the receiver side:
The application at this point can implement a callback that will encrypt or decrypt the data outside of the scope and context of WebRTC, making it indiscernible to media servers and anyone else who has no access to the encryption keys used.
The keys themselves are negotiated outside of the scope of WebRTC.
With E2EE, each packet gets encrypted twice:
- First time using the application level encryption known only to the clients
- Second time using SRTP in the standard mechanism WebRTC uses, making it readable by media servers]
This separation of keys enables an organizational separation where one organization may know the E2EE key – the client side application level secret, and the other organization who is hosting and running the SFUs does not.
E2EE using Insertable Streams is applicable to SFUs and cannot be used in MCUs.