Hardware Acceleration is Real

June 20, 2016

Hardware acceleration for video codecs is almost mandatory – vp9 codec is getting a performance boost.

VP9 codec is getting a performance boost
VP9 codec is getting a performance boost

There are three things that keep VP8 in the game when compared to H.264:

  1. It was the only video codec in Chrome for WebRTC in the last 5 years, giving it a headstart in deployments
  2. H.264 while available in mobile chipsets isn’t always accessible for the developer (or works as it should when it is accessible)
  3. VP8 and H.264 are rather old now, so software implementations of them are quite decent

With VP9 codec, the main worry was that it will be left behind and not get the love and attention from chipset vendors – leading it to the same fate as VP8 – abysmal, if any, hardware acceleration support. It is probably why Google went to great lengths to make it running on YouTube so soon and is publicizing its stats all the time.

👉 Check out if VP9 codec is suitable for your WebRTC application

This worry is now rather behind us. Recent signs show some serious adoption from the companies that we should really care about:

#1 – ARM and VP9 acceleration

Mobile=ARM

Without checking stats, I’d say that 99% or more of all smartphones sold in the past 5 years are based on ARM.

If and when ARM decides to support a feature directly, that brings said feature very close towards world domination in future smartpones.

Which is somewhat what happened last week – ARM announced its Mali Egil Video Processor with VP9 acceleration.

Here’s a deck they shared:

VP9 Codec slide share by Roger Barker

Being farther away from chipsets than I were 5 years ago, it is hard for me to say if this is an integral part of an ARM processor, but I believe that it isn’t. It is an add-on component that takes care of video processing that chipset vendors add next to their ARM core. They can source the design from ARM or other suppliers – or they can develop their own.

Not sure how popular the ARM alternative is for video processing, but they have the advantage of being the first alternative for any chipset vendor (hell – they already source the ARM core itself, so why not bundle?). Which also means every other vendor needs to match up to their feature set – and improve on it.

Now that VP9 encode/decode capabilities are front and center in the ARM Mali Egil, it has become a mandatory checkmark for everyone else as well.

#2 – Intel on the video codec

If ARM is the king of mobile, then Intel rules the desktop.

As with ARM, I haven’t been following up on Intel CPU acceleration lately. And as with ARM, it was Fippo who got my attention with this link here: the new Intel Media SDK.

For those who don’t know, Intel is providing several interesting software packages that make direct use of its chipset capabilities. Especially when it comes to optimizing different types of workloads. The Intel IPP and Media SDKs handle media related processing, and are quite popular by low level developers who need access to such facilities.

From the release page itself:

With this release we are happy to announce new full hardware accelerated support for HEVC and VP9.

  • So… HEVC (=H.265) has encode and decode while VP9 only has decode support.
  • Probably because HEVC has been in the works for a lot longer than VP9, but there’s hope still.

#3 – Alliance of Open Media

The Alliance of Open Media. I’ve published a recent update on the alliance.

Intel was there from the start. The recent additions include ARM, AMD and NVIDIA. Sadly, Qualcomm isn’t part of the alliance.

I am sure additional chipset vendors will be joining in the coming months – there seems to be a ramp up in memberships there.

The alliance is working on AV1, a video codec that is planned to inherit VP9 and have wide industry acceptance and adoption. We are already seeing discussions around HEVC vs AV1 in WebRTC.

While the alliance is about what comes after VP9, it is easy to see how these vendors may sway to using VP9 in the interim.

The Future – VP9 codec and beyond

The future is most definitely one of royalty free video codecs. We’ve got there with voice, now that we have OPUS (though Speex and SILK were there before to pave the way). We will get there with video as well.

Coding technologies need to be accessible and available to everyone – freely – if we are to achieve Benedict Evans’ latest claims: Video is the new HTML. But for that, I’ll need another post.

So… which of these video codecs should you use in your application? Here’s a free mini video course to help you decide.

FAQ on video codec acceleration

✅ Can VP9 work without hardware acceleration?

Yes it can, and it does.
Google uses VP9 in YouTube, Hangouts Meet and even Stadia. Google makes the decision of which codec to use based on the performance available in the machine in question, and at times, VP9 is preferred due to its better compression.

✅ Is hardware acceleration for video codecs important in WebRTC?

Yes and no.
WebRTC has succeeded to grow nicely with VP8 and virtually no mass hardware accelelration in devices. That said, vendors are preferring the use of hardware acceleration due to its advantages (battery life, less heating of the CPU, etc). This is why H.264 is the preferred codec by many vendors if/when they can use it. Remember tohugh that H.264 video codec use isn’t a guarantee for availability of hardware acceleration.

✅ Is hardware acceleration for video codecs offer better performance than software video codec implementation?

Not necessarily. Hardware acceleration means certain features of the codec cannot be further optimized the same way that they can with software – features such as simulcast, machine learning based encoding or SVC are usually hard or impossible to achieve with hardware acceleration.
This makes the question of which codec implementation to use – software or hardware – a harder one to answer.

More about VP9 codec and WebRTC.


You may also like

Comment​

Your email address will not be published. Required fields are marked

  1. I’m curious, would it be easy to add a support to any of VAAPI drivers, or VP9 needs a special hardware support? (seems not, but then why it is not there yet?)

    1. Constantine,

      As far as my understanding goes, like any other video codec, you really do want hardware acceleration for it – and like any other video codec – such support is special at least to some extent.

    2. Hello there,

      Adding the reference encoders and decoders in VAAPI isn’t trivial, because Intel’s drivers (i915, and the intel-hybrid-driver used to expose hybrid decode capabilities for SKUs such as Skylake for VP9 and HEVC 10-bit) must support the texture format (VAAPI_VLDs) that the VAAPI stack mandates.

      It’s for this reason that some platforms, such as AMD’s Polaris can do HEVC 10-bit decode in hardware via VAAPI because they can accept that texture format through their mesa-gallium driver implementation (Set LIBVA_DRIVER_NAME to radeonsi).

  2. As vaapi drivers are by design driving specialized hardware, which implements decoding (and encoding) of specific format “in metal”, you first need such hardware. And it is easy to support vp9 in vaapi on such hardware. Really it is supported.
    Of course it is easy possible to have codec working without specialized hardware, on the “general purpose” processing unit, instead.

    Also it may be interest to have codec working by GPU shaders …

    1. A small comment about having a hardware-accelerated video encode pipeline running on GPU shaders: Yes, it has been done before, in the age before SIP blocks such as NVIDIA’s NVENC, AMD’s VCE and Intel’s QuickSync.

      However, general purpose shaders are very inefficient at encoding, encumbered from the programmability, power draw and video quality tuning. In the past, we’ve had ATI’s AVIVO and Nvidia’s NVCUVENC (CUVID)-based encoding (now deprecated in favor of NVENC), and some encoders such as x264 have even implemented an OpenCL – based lookahead system, which has somewhat diminishing returns on higher end hardware.

      For decoding, a hybrid approach has been used extensively, even on current generation hardware such as Intel’s Skylake utilizing a hybrid mode for HEVC 10-bit and VP9 8 and 10-bit codecs. And that comes with the caveats mentioned above: Increased power draw (at the baseline), and nearly unpredictable performance on varying hardware configurations. For instance, an Intel Iris Pro SKU is likely bundled with a faster CPU, resulting in better decode performance. The same cannot be said of another device form factor, such as a tablet, that utilizes a weaker, binned version of the same integrated GPU.

      And on programmability: Look at AMD’s VCE hybrid encode mode, which is rarely, if at all, used. There are tasks that are best left out of shader pipelines, and video decoding is one of them.

      Over time, we expect to see a wide range of hardware-based SIP blocks implementing support for up and coming codecs such as Alliance for Open Media’s V1, the current VP9, VP8, HEVC and H.264 codecs. Infact, for AOM’s codecs to succeed, they’ll need on-launch hardware-based acceleration to enable mass adoption.

      The thing is, when we scale down silicon to enable the same media playback functionalities on mobile platforms as is on faster mainstream PCs, the argument for hybrid-based approaches to video decode and encodes fades out quickly. System integrators and SIP designers such as Cadence/Tensilica have stepped out to provide dedicated licensed SIP hardware blocks to AMD for both the VCE and the VCN (coming in Vega GPUs) for this very reason.

      And over time, you can expect to see cross-vendor licensing of SIPs on variant platforms for these applications. Intel, for instance, has used PowerVR SGX 535 graphics cores developed by Imagination Technologies under license for their GMA 500 GPUs, and such collaborations will only continue to tighten.

      Regards,

      Dennis.

  3. Hello guys,

    As of today, FFmpeg now supports a VAAPI-based VP9 encoder when FFmpeg is built with –enable-vaapi option: https://gist.github.com/Brainiarc7/24de2edef08866c304080504877239a3

    However, you’ll need an Intel Kabylake-based Integrated GPU to take advantage of this feature.

    And now, with the new vp9_vaapi encoder, here’s what we get.

    Encoder options now available:

    ffmpeg -h vp9_vaapi

    Output:

    Encoder vp9_vaapi [VP9 (VAAPI)]:
    General capabilities: delay
    Threading capabilities: none
    Supported pixel formats: vaapi_vld
    vp9_vaapi AVOptions:
    -loop_filter_level E..V…. Loop filter level (from 0 to 63) (default 16)
    -loop_filter_sharpness E..V…. Loop filter sharpness (from 0 to 15) (default 4)

    What happens when you try to pull this off on unsupported hardware, say Skylake?

    See the sample output below:

    [Parsed_format_0 @ 0x42cb500] compat: called with args=[nv12]
    [Parsed_format_0 @ 0x42cb500] Setting ‘pix_fmts’ to value ‘nv12’
    [Parsed_scale_vaapi_2 @ 0x42cc300] Setting ‘w’ to value ‘1920’
    [Parsed_scale_vaapi_2 @ 0x42cc300] Setting ‘h’ to value ‘1080’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘video_size’ to value ‘3840×2026’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘pix_fmt’ to value ‘0’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘time_base’ to value ‘1/1000’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘pixel_aspect’ to value ‘1/1’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘sws_param’ to value ‘flags=2’
    [graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘frame_rate’ to value ‘24000/1001’
    [graph 0 input from stream 0:0 @ 0x42cce00] w:3840 h:2026 pixfmt:yuv420p tb:1/1000 fr:24000/1001 sar:1/1 sws_param:flags=2
    [format @ 0x42cba40] compat: called with args=[vaapi_vld]
    [format @ 0x42cba40] Setting ‘pix_fmts’ to value ‘vaapi_vld’
    [auto_scaler_0 @ 0x42cd580] Setting ‘flags’ to value ‘bicubic’
    [auto_scaler_0 @ 0x42cd580] w:iw h:ih flags:’bicubic’ interl:0
    [Parsed_format_0 @ 0x42cb500] auto-inserting filter ‘auto_scaler_0’ between the filter ‘graph 0 input from stream 0:0’ and the filter ‘Parsed_format_0’
    [AVFilterGraph @ 0x42ca360] query_formats: 6 queried, 4 merged, 1 already done, 0 delayed
    [auto_scaler_0 @ 0x42cd580] w:3840 h:2026 fmt:yuv420p sar:1/1 -> w:3840 h:2026 fmt:nv12 sar:1/1 flags:0x4
    [hwupload @ 0x42cbcc0] Surface format is nv12.
    [AVHWFramesContext @ 0x42ccbc0] Created surface 0x4000000.
    [AVHWFramesContext @ 0x42ccbc0] Direct mapping possible.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000001.
    [AVHWFramesContext @ 0x42c3e40] Direct mapping possible.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000002.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000003.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000004.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000005.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000006.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000007.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000008.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x4000009.
    [AVHWFramesContext @ 0x42c3e40] Created surface 0x400000a.
    [vp9_vaapi @ 0x409da40] Encoding entrypoint not found (19 / 6).
    Error initializing output stream 0:0 — Error while opening encoder for output stream #0:0 – maybe incorrect parameters such as bit_rate, rate, width or height
    [AVIOContext @ 0x40fdac0] Statistics: 0 seeks, 0 writeouts
    [aac @ 0x40fcb00] Qavg: -nan
    [AVIOContext @ 0x409f820] Statistics: 32768 bytes read, 0 seeks
    Conversion failed!

    The interesting bits are the entrypoint warnings for VP9 encoding being absent on this particular platform, as confirmed by vainfo’s output:

    libva info: VA-API version 0.40.0
    libva info: va_getDriverName() returns 0
    libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
    libva info: Found init function __vaDriverInit_0_40
    libva info: va_openDriver() returns 0
    vainfo: VA-API version: 0.40 (libva 1.7.3)
    vainfo: Driver version: Intel i965 driver for Intel(R) Skylake – 1.8.4.pre1 (glk-alpha-71-gc3110dc)
    vainfo: Supported profile and entrypoints
    VAProfileMPEG2Simple : VAEntrypointVLD
    VAProfileMPEG2Simple : VAEntrypointEncSlice
    VAProfileMPEG2Main : VAEntrypointVLD
    VAProfileMPEG2Main : VAEntrypointEncSlice
    VAProfileH264ConstrainedBaseline: VAEntrypointVLD
    VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
    VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
    VAProfileH264Main : VAEntrypointVLD
    VAProfileH264Main : VAEntrypointEncSlice
    VAProfileH264Main : VAEntrypointEncSliceLP
    VAProfileH264High : VAEntrypointVLD
    VAProfileH264High : VAEntrypointEncSlice
    VAProfileH264High : VAEntrypointEncSliceLP
    VAProfileH264MultiviewHigh : VAEntrypointVLD
    VAProfileH264MultiviewHigh : VAEntrypointEncSlice
    VAProfileH264StereoHigh : VAEntrypointVLD
    VAProfileH264StereoHigh : VAEntrypointEncSlice
    VAProfileVC1Simple : VAEntrypointVLD
    VAProfileVC1Main : VAEntrypointVLD
    VAProfileVC1Advanced : VAEntrypointVLD
    VAProfileNone : VAEntrypointVideoProc
    VAProfileJPEGBaseline : VAEntrypointVLD
    VAProfileJPEGBaseline : VAEntrypointEncPicture
    VAProfileVP8Version0_3 : VAEntrypointVLD
    VAProfileVP8Version0_3 : VAEntrypointEncSlice
    VAProfileHEVCMain : VAEntrypointVLD
    VAProfileHEVCMain : VAEntrypointEncSlice
    VAProfileVP9Profile0 : VAEntrypointVLD

    The VLD (for Variable Length Decode) entry point for VP9 profile 0 is the furthest that Skylake comes to in terms of VP9 hardware-acceleration.

    These with Kabylake test beds, run these encode tests and report back 🙂

  4. Iam ultra confused and this (older) article provides the best informations to my question…

    Do you have any new data on the VP9 software decoding vs. hardware decoding?

    Google Stadia on higher res is using VP9 encoding and my CPU (Intel, with older hardware decoding) is at its limit… So I want to buy a new CPU, so I dont have to close all that stuff Iam using, to play Stadia without lags.

    And I dont know, If I should head for a Intel Core i5-9600K with a Intel® UHD Graphics 630 on board (VP9 hardware accelleration) or (with mainboard) for around the same price the Ryzen 5 5600 (~70% better at benchmarks!)… So more power for Software Acceleration or hardware, that supports Hardware Acceleration… 🙂

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}