diff mbox series

[FFmpeg-devel,31/92] Vulkan patchset part 1 - common code changes

Message ID NQTold9--3-9@lynne.ee
State New
Headers show
Series None | expand

Commit Message

Lynne March 14, 2023, 6:33 a.m. UTC
The attached patchset is all the common code changes that my Vulkan patchset needs.

In total lines of code, this part has 425 additions and 131 deletions.
Most of that is additions to HEVC parsing. Excluding them, the patchset is
200 lines of code added, which is manageable.

Apart from the parser changes, the following other changes have been
made to the API:

AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView
objects to wrap VkImage objects, which we need to free once they're no longer
in use. Every other API uses the direct objects in decoding, but with Vulkan,
they have to be represented by other objects.
We also use it to free the slice offsets buffer.

AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being
stateless in theory. The decoder has to be notified of flushes in order to reset
decoding slots and other data it needs, such as motion vectors and reference
lists for AV1. Otherwise, inferring whether a flush has happened can be codec
dependent, and hacky.

hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data
into objects, making updating expensive. The change allows for hardware
to only upload new parameters if they have been changed.
It's insignificant for H264 and AV1, but HEVC's structures can reach 114
megabytes of data that has to be uploaded, for a specially crafted input,
which is enough to DDOS an ingest.
The data is set and managed by the hwaccel, but does need to be synchronized
between different decoding threads, which this patch performs.

Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being
actually threadsafe, and requiring no serialization. It does work and it
does actually make a difference, on average, it can increase performance
by 20% for an average B-frame using HEVC stream, depending on the
number of threads and the number of decode queues.
While hardware decoders are fast in general, certain vendors such as AMD
can choke up while playing 8k video, and this patch can significantly help
increase throughput.

In context, the changes can be viewed here:
https://github.com/cyanreg/FFmpeg/tree/vulkan

The rest of the whole patchset is either rewrites, filter code, or
the actual hardware accel code.

The patchset will not be pushed standalone, but as part of the greater
Vulkan patchset.

31 patches attached.

Comments

James Almer March 14, 2023, 11:51 a.m. UTC | #1
On 3/14/2023 3:33 AM, Lynne wrote:
> The attached patchset is all the common code changes that my Vulkan patchset needs.
> 
> In total lines of code, this part has 425 additions and 131 deletions.
> Most of that is additions to HEVC parsing. Excluding them, the patchset is
> 200 lines of code added, which is manageable.
> 
> Apart from the parser changes, the following other changes have been
> made to the API:
> 
> AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView
> objects to wrap VkImage objects, which we need to free once they're no longer
> in use. Every other API uses the direct objects in decoding, but with Vulkan,
> they have to be represented by other objects.
> We also use it to free the slice offsets buffer.
> 
> AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being
> stateless in theory. The decoder has to be notified of flushes in order to reset
> decoding slots and other data it needs, such as motion vectors and reference
> lists for AV1. Otherwise, inferring whether a flush has happened can be codec
> dependent, and hacky.
> 
> hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data
> into objects, making updating expensive. The change allows for hardware
> to only upload new parameters if they have been changed.
> It's insignificant for H264 and AV1, but HEVC's structures can reach 114
> megabytes of data that has to be uploaded, for a specially crafted input,
> which is enough to DDOS an ingest.
> The data is set and managed by the hwaccel, but does need to be synchronized
> between different decoding threads, which this patch performs.
> 
> Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being
> actually threadsafe, and requiring no serialization. It does work and it
> does actually make a difference, on average, it can increase performance
> by 20% for an average B-frame using HEVC stream, depending on the
> number of threads and the number of decode queues.
> While hardware decoders are fast in general, certain vendors such as AMD
> can choke up while playing 8k video, and this patch can significantly help
> increase throughput.
> 
> In context, the changes can be viewed here:
> https://github.com/cyanreg/FFmpeg/tree/vulkan
> 
> The rest of the whole patchset is either rewrites, filter code, or
> the actual hardware accel code.
> 
> The patchset will not be pushed standalone, but as part of the greater
> Vulkan patchset.
> 
> 31 patches attached.

[...]

> diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
> index a80e37e33f..5a3c51e94a 100644
> --- a/libavcodec/av1dec.c
> +++ b/libavcodec/av1dec.c
> @@ -591,6 +591,8 @@ static void av1_frame_unref(AVCodecContext *avctx, AV1Frame *f)
>      f->spatial_id = f->temporal_id = 0;
>      memset(f->skip_mode_frame_idx, 0,
>             2 * sizeof(uint8_t));
> +    memset(f->ref_order_hint, 0,
> +           7 * sizeof(uint8_t));
>      memset(&f->film_grain, 0, sizeof(f->film_grain));
>      f->coded_lossless = 0;
>  }
> @@ -633,6 +635,9 @@ static int av1_frame_ref(AVCodecContext *avctx, AV1Frame *dst, const AV1Frame *s
>      memcpy(dst->skip_mode_frame_idx,
>             src->skip_mode_frame_idx,
>             2 * sizeof(uint8_t));
> +    memcpy(dst->ref_order_hint,
> +           src->ref_order_hint,
> +           7 * sizeof(uint8_t));
>      memcpy(&dst->film_grain,
>             &src->film_grain,
>             sizeof(dst->film_grain));
> @@ -1267,6 +1272,10 @@ static int av1_decode_frame(AVCodecContext *avctx, AVFrame *frame,
>              s->cur_frame.spatial_id  = header->spatial_id;
>              s->cur_frame.temporal_id = header->temporal_id;
>  
> +            for (int i = 0; i < 7; i++)
> +                s->cur_frame.ref_order_hint[i] =
> +                s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]];

Why do you need this in cur_frame? It's not a derived value, and the 
AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks.

And i think you should be looking at 
s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint 
instead, too, which is the decoder state vs the raw values in the 
current frame header (Although they should match in theory).

> +
>              if (avctx->hwaccel && s->cur_frame.f->buf[0]) {
>                  ret = avctx->hwaccel->start_frame(avctx, unit->data,
Lynne March 15, 2023, 7:41 p.m. UTC | #2
Mar 14, 2023, 12:51 by jamrial@gmail.com:

> On 3/14/2023 3:33 AM, Lynne wrote:
>
>>  +            for (int i = 0; i < 7; i++)
>> +                s->cur_frame.ref_order_hint[i] =
>> +                s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]];
>>
>
> Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks.
>
> And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory).
>
>> +
>>  if (avctx->hwaccel && s->cur_frame.f->buf[0]) {
>>  ret = avctx->hwaccel->start_frame(avctx, unit->data,
>>

Fixed, removed the commit altogether and just used the value directly.
Thanks for the review
diff mbox series

Patch

From 26136684812c3e94a35dea25cb5edfe32601a518 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sat, 25 Feb 2023 09:36:58 +0100
Subject: [PATCH 31/92] lsws: add in/out support for the new 12-bit 2-plane 422
 and 444 pixfmts

---
 libswscale/input.c               |  8 ++++++++
 libswscale/utils.c               |  4 ++++
 tests/ref/fate/sws-pixdesc-query | 26 ++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/libswscale/input.c b/libswscale/input.c
index d5676062a2..41795c636e 100644
--- a/libswscale/input.c
+++ b/libswscale/input.c
@@ -1452,9 +1452,13 @@  av_cold void ff_sws_init_input_funcs(SwsContext *c)
         c->chrToYV12 = p010BEToUV_c;
         break;
     case AV_PIX_FMT_P012LE:
+    case AV_PIX_FMT_P212LE:
+    case AV_PIX_FMT_P412LE:
         c->chrToYV12 = p012LEToUV_c;
         break;
     case AV_PIX_FMT_P012BE:
+    case AV_PIX_FMT_P212BE:
+    case AV_PIX_FMT_P412BE:
         c->chrToYV12 = p012BEToUV_c;
         break;
     case AV_PIX_FMT_P016LE:
@@ -1944,9 +1948,13 @@  av_cold void ff_sws_init_input_funcs(SwsContext *c)
         c->lumToYV12 = p010BEToY_c;
         break;
     case AV_PIX_FMT_P012LE:
+    case AV_PIX_FMT_P212LE:
+    case AV_PIX_FMT_P412LE:
         c->lumToYV12 = p012LEToY_c;
         break;
     case AV_PIX_FMT_P012BE:
+    case AV_PIX_FMT_P212BE:
+    case AV_PIX_FMT_P412BE:
         c->lumToYV12 = p012BEToY_c;
         break;
     case AV_PIX_FMT_GRAYF32LE:
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 925c536bf1..a3a7a40750 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -248,8 +248,12 @@  static const FormatEntry format_entries[] = {
     [AV_PIX_FMT_X2BGR10LE]   = { 1, 1 },
     [AV_PIX_FMT_P210BE]      = { 1, 1 },
     [AV_PIX_FMT_P210LE]      = { 1, 1 },
+    [AV_PIX_FMT_P212BE]      = { 1, 1 },
+    [AV_PIX_FMT_P212LE]      = { 1, 1 },
     [AV_PIX_FMT_P410BE]      = { 1, 1 },
     [AV_PIX_FMT_P410LE]      = { 1, 1 },
+    [AV_PIX_FMT_P412BE]      = { 1, 1 },
+    [AV_PIX_FMT_P412LE]      = { 1, 1 },
     [AV_PIX_FMT_P216BE]      = { 1, 1 },
     [AV_PIX_FMT_P216LE]      = { 1, 1 },
     [AV_PIX_FMT_P416BE]      = { 1, 1 },
diff --git a/tests/ref/fate/sws-pixdesc-query b/tests/ref/fate/sws-pixdesc-query
index 14156a383c..fd7f2aefc0 100644
--- a/tests/ref/fate/sws-pixdesc-query
+++ b/tests/ref/fate/sws-pixdesc-query
@@ -67,8 +67,12 @@  isNBPS:
   p012le
   p210be
   p210le
+  p212be
+  p212le
   p410be
   p410le
+  p412be
+  p412le
   x2bgr10be
   x2bgr10le
   x2rgb10be
@@ -160,8 +164,10 @@  isBE:
   p012be
   p016be
   p210be
+  p212be
   p216be
   p410be
+  p412be
   p416be
   rgb444be
   rgb48be
@@ -226,10 +232,14 @@  isYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   uyvy422
@@ -338,10 +348,14 @@  isPlanarYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   yuv410p
@@ -431,10 +445,14 @@  isSemiPlanarYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
 
@@ -853,10 +871,14 @@  Planar:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   yuv410p
@@ -1029,8 +1051,12 @@  DataInHighBits:
   p012le
   p210be
   p210le
+  p212be
+  p212le
   p410be
   p410le
+  p412be
+  p412le
   xv36be
   xv36le
   xyz12be
-- 
2.39.2