Comments on: Hardware Acceleration is Real https://bloggeek.me/vp9-hardware-acceleration/ The leading authority on WebRTC Fri, 30 Oct 2020 12:24:23 +0000 hourly 1 By: Björn https://bloggeek.me/vp9-hardware-acceleration/#comment-122063 Thu, 04 Jun 2020 22:20:48 +0000 https://bloggeek.me/?p=10400#comment-122063 Iam ultra confused and this (older) article provides the best informations to my question…

Do you have any new data on the VP9 software decoding vs. hardware decoding?

Google Stadia on higher res is using VP9 encoding and my CPU (Intel, with older hardware decoding) is at its limit… So I want to buy a new CPU, so I dont have to close all that stuff Iam using, to play Stadia without lags.

And I dont know, If I should head for a Intel Core i5-9600K with a Intel® UHD Graphics 630 on board (VP9 hardware accelleration) or (with mainboard) for around the same price the Ryzen 5 5600 (~70% better at benchmarks!)… So more power for Software Acceleration or hardware, that supports Hardware Acceleration… 🙂

]]>
By: Dennis Mungai https://bloggeek.me/vp9-hardware-acceleration/#comment-118558 Tue, 04 Jul 2017 00:54:54 +0000 https://bloggeek.me/?p=10400#comment-118558 In reply to minus.

A small comment about having a hardware-accelerated video encode pipeline running on GPU shaders: Yes, it has been done before, in the age before SIP blocks such as NVIDIA’s NVENC, AMD’s VCE and Intel’s QuickSync.

However, general purpose shaders are very inefficient at encoding, encumbered from the programmability, power draw and video quality tuning. In the past, we’ve had ATI’s AVIVO and Nvidia’s NVCUVENC (CUVID)-based encoding (now deprecated in favor of NVENC), and some encoders such as x264 have even implemented an OpenCL – based lookahead system, which has somewhat diminishing returns on higher end hardware.

For decoding, a hybrid approach has been used extensively, even on current generation hardware such as Intel’s Skylake utilizing a hybrid mode for HEVC 10-bit and VP9 8 and 10-bit codecs. And that comes with the caveats mentioned above: Increased power draw (at the baseline), and nearly unpredictable performance on varying hardware configurations. For instance, an Intel Iris Pro SKU is likely bundled with a faster CPU, resulting in better decode performance. The same cannot be said of another device form factor, such as a tablet, that utilizes a weaker, binned version of the same integrated GPU.

And on programmability: Look at AMD’s VCE hybrid encode mode, which is rarely, if at all, used. There are tasks that are best left out of shader pipelines, and video decoding is one of them.

Over time, we expect to see a wide range of hardware-based SIP blocks implementing support for up and coming codecs such as Alliance for Open Media’s V1, the current VP9, VP8, HEVC and H.264 codecs. Infact, for AOM’s codecs to succeed, they’ll need on-launch hardware-based acceleration to enable mass adoption.

The thing is, when we scale down silicon to enable the same media playback functionalities on mobile platforms as is on faster mainstream PCs, the argument for hybrid-based approaches to video decode and encodes fades out quickly. System integrators and SIP designers such as Cadence/Tensilica have stepped out to provide dedicated licensed SIP hardware blocks to AMD for both the VCE and the VCN (coming in Vega GPUs) for this very reason.

And over time, you can expect to see cross-vendor licensing of SIPs on variant platforms for these applications. Intel, for instance, has used PowerVR SGX 535 graphics cores developed by Imagination Technologies under license for their GMA 500 GPUs, and such collaborations will only continue to tighten.

Regards,

Dennis.

]]>
By: Tsahi Levent-Levi https://bloggeek.me/vp9-hardware-acceleration/#comment-118557 Mon, 26 Jun 2017 12:01:23 +0000 https://bloggeek.me/?p=10400#comment-118557 In reply to Dennis Mungai.

Dennis – thanks for sharing. I really like where this is headed.

]]>
By: Dennis Mungai https://bloggeek.me/vp9-hardware-acceleration/#comment-118556 Mon, 26 Jun 2017 10:55:58 +0000 https://bloggeek.me/?p=10400#comment-118556 Hello guys,

As of today, FFmpeg now supports a VAAPI-based VP9 encoder when FFmpeg is built with –enable-vaapi option: https://gist.github.com/Brainiarc7/24de2edef08866c304080504877239a3

However, you’ll need an Intel Kabylake-based Integrated GPU to take advantage of this feature.

And now, with the new vp9_vaapi encoder, here’s what we get.

Encoder options now available:

ffmpeg -h vp9_vaapi

Output:

Encoder vp9_vaapi [VP9 (VAAPI)]:
General capabilities: delay
Threading capabilities: none
Supported pixel formats: vaapi_vld
vp9_vaapi AVOptions:
-loop_filter_level E..V…. Loop filter level (from 0 to 63) (default 16)
-loop_filter_sharpness E..V…. Loop filter sharpness (from 0 to 15) (default 4)

What happens when you try to pull this off on unsupported hardware, say Skylake?

See the sample output below:

[Parsed_format_0 @ 0x42cb500] compat: called with args=[nv12]
[Parsed_format_0 @ 0x42cb500] Setting ‘pix_fmts’ to value ‘nv12’
[Parsed_scale_vaapi_2 @ 0x42cc300] Setting ‘w’ to value ‘1920’
[Parsed_scale_vaapi_2 @ 0x42cc300] Setting ‘h’ to value ‘1080’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘video_size’ to value ‘3840×2026’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘pix_fmt’ to value ‘0’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘time_base’ to value ‘1/1000’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘pixel_aspect’ to value ‘1/1’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘sws_param’ to value ‘flags=2’
[graph 0 input from stream 0:0 @ 0x42cce00] Setting ‘frame_rate’ to value ‘24000/1001’
[graph 0 input from stream 0:0 @ 0x42cce00] w:3840 h:2026 pixfmt:yuv420p tb:1/1000 fr:24000/1001 sar:1/1 sws_param:flags=2
[format @ 0x42cba40] compat: called with args=[vaapi_vld]
[format @ 0x42cba40] Setting ‘pix_fmts’ to value ‘vaapi_vld’
[auto_scaler_0 @ 0x42cd580] Setting ‘flags’ to value ‘bicubic’
[auto_scaler_0 @ 0x42cd580] w:iw h:ih flags:’bicubic’ interl:0
[Parsed_format_0 @ 0x42cb500] auto-inserting filter ‘auto_scaler_0’ between the filter ‘graph 0 input from stream 0:0’ and the filter ‘Parsed_format_0’
[AVFilterGraph @ 0x42ca360] query_formats: 6 queried, 4 merged, 1 already done, 0 delayed
[auto_scaler_0 @ 0x42cd580] w:3840 h:2026 fmt:yuv420p sar:1/1 -> w:3840 h:2026 fmt:nv12 sar:1/1 flags:0x4
[hwupload @ 0x42cbcc0] Surface format is nv12.
[AVHWFramesContext @ 0x42ccbc0] Created surface 0x4000000.
[AVHWFramesContext @ 0x42ccbc0] Direct mapping possible.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000001.
[AVHWFramesContext @ 0x42c3e40] Direct mapping possible.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000002.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000003.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000004.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000005.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000006.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000007.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000008.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x4000009.
[AVHWFramesContext @ 0x42c3e40] Created surface 0x400000a.
[vp9_vaapi @ 0x409da40] Encoding entrypoint not found (19 / 6).
Error initializing output stream 0:0 — Error while opening encoder for output stream #0:0 – maybe incorrect parameters such as bit_rate, rate, width or height
[AVIOContext @ 0x40fdac0] Statistics: 0 seeks, 0 writeouts
[aac @ 0x40fcb00] Qavg: -nan
[AVIOContext @ 0x409f820] Statistics: 32768 bytes read, 0 seeks
Conversion failed!

The interesting bits are the entrypoint warnings for VP9 encoding being absent on this particular platform, as confirmed by vainfo’s output:

libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva 1.7.3)
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake – 1.8.4.pre1 (glk-alpha-71-gc3110dc)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Simple : VAEntrypointEncSlice
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointEncSlice
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointEncSlice
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileVP9Profile0 : VAEntrypointVLD

The VLD (for Variable Length Decode) entry point for VP9 profile 0 is the furthest that Skylake comes to in terms of VP9 hardware-acceleration.

These with Kabylake test beds, run these encode tests and report back 🙂

]]>
By: Dennis Mungai https://bloggeek.me/vp9-hardware-acceleration/#comment-118555 Sat, 03 Jun 2017 16:58:24 +0000 https://bloggeek.me/?p=10400#comment-118555 In reply to Constantine.

Hello there,

Adding the reference encoders and decoders in VAAPI isn’t trivial, because Intel’s drivers (i915, and the intel-hybrid-driver used to expose hybrid decode capabilities for SKUs such as Skylake for VP9 and HEVC 10-bit) must support the texture format (VAAPI_VLDs) that the VAAPI stack mandates.

It’s for this reason that some platforms, such as AMD’s Polaris can do HEVC 10-bit decode in hardware via VAAPI because they can accept that texture format through their mesa-gallium driver implementation (Set LIBVA_DRIVER_NAME to radeonsi).

]]>
By: Tsahi Levent-Levi https://bloggeek.me/vp9-hardware-acceleration/#comment-118553 Tue, 04 Oct 2016 04:16:50 +0000 https://bloggeek.me/?p=10400#comment-118553 In reply to minus.

Thanks.

If it is that simple, then why hasn’t it been done so far?

]]>
By: minus https://bloggeek.me/vp9-hardware-acceleration/#comment-118552 Tue, 04 Oct 2016 00:41:34 +0000 https://bloggeek.me/?p=10400#comment-118552 As vaapi drivers are by design driving specialized hardware, which implements decoding (and encoding) of specific format “in metal”, you first need such hardware. And it is easy to support vp9 in vaapi on such hardware. Really it is supported.
Of course it is easy possible to have codec working without specialized hardware, on the “general purpose” processing unit, instead.

Also it may be interest to have codec working by GPU shaders …

]]>
By: Tsahi Levent-Levi https://bloggeek.me/vp9-hardware-acceleration/#comment-118551 Mon, 26 Sep 2016 19:22:49 +0000 https://bloggeek.me/?p=10400#comment-118551 In reply to Constantine.

Constantine,

As far as my understanding goes, like any other video codec, you really do want hardware acceleration for it – and like any other video codec – such support is special at least to some extent.

]]>
By: Constantine https://bloggeek.me/vp9-hardware-acceleration/#comment-118550 Mon, 26 Sep 2016 14:22:06 +0000 https://bloggeek.me/?p=10400#comment-118550 I’m curious, would it be easy to add a support to any of VAAPI drivers, or VP9 needs a special hardware support? (seems not, but then why it is not there yet?)

]]>