diff mbox series

[FFmpeg-devel,31/92] Vulkan patchset part 1 - common code changes

Message ID NQTold9--3-9@lynne.ee
State New
Headers show
Series None | expand

Commit Message

Lynne March 14, 2023, 6:33 a.m. UTC
The attached patchset is all the common code changes that my Vulkan patchset needs.

In total lines of code, this part has 425 additions and 131 deletions.
Most of that is additions to HEVC parsing. Excluding them, the patchset is
200 lines of code added, which is manageable.

Apart from the parser changes, the following other changes have been
made to the API:

AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView
objects to wrap VkImage objects, which we need to free once they're no longer
in use. Every other API uses the direct objects in decoding, but with Vulkan,
they have to be represented by other objects.
We also use it to free the slice offsets buffer.

AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being
stateless in theory. The decoder has to be notified of flushes in order to reset
decoding slots and other data it needs, such as motion vectors and reference
lists for AV1. Otherwise, inferring whether a flush has happened can be codec
dependent, and hacky.

hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data
into objects, making updating expensive. The change allows for hardware
to only upload new parameters if they have been changed.
It's insignificant for H264 and AV1, but HEVC's structures can reach 114
megabytes of data that has to be uploaded, for a specially crafted input,
which is enough to DDOS an ingest.
The data is set and managed by the hwaccel, but does need to be synchronized
between different decoding threads, which this patch performs.

Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being
actually threadsafe, and requiring no serialization. It does work and it
does actually make a difference, on average, it can increase performance
by 20% for an average B-frame using HEVC stream, depending on the
number of threads and the number of decode queues.
While hardware decoders are fast in general, certain vendors such as AMD
can choke up while playing 8k video, and this patch can significantly help
increase throughput.

In context, the changes can be viewed here:
https://github.com/cyanreg/FFmpeg/tree/vulkan

The rest of the whole patchset is either rewrites, filter code, or
the actual hardware accel code.

The patchset will not be pushed standalone, but as part of the greater
Vulkan patchset.

31 patches attached.

Comments

James Almer March 14, 2023, 11:51 a.m. UTC | #1
On 3/14/2023 3:33 AM, Lynne wrote:
> The attached patchset is all the common code changes that my Vulkan patchset needs.
> 
> In total lines of code, this part has 425 additions and 131 deletions.
> Most of that is additions to HEVC parsing. Excluding them, the patchset is
> 200 lines of code added, which is manageable.
> 
> Apart from the parser changes, the following other changes have been
> made to the API:
> 
> AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView
> objects to wrap VkImage objects, which we need to free once they're no longer
> in use. Every other API uses the direct objects in decoding, but with Vulkan,
> they have to be represented by other objects.
> We also use it to free the slice offsets buffer.
> 
> AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being
> stateless in theory. The decoder has to be notified of flushes in order to reset
> decoding slots and other data it needs, such as motion vectors and reference
> lists for AV1. Otherwise, inferring whether a flush has happened can be codec
> dependent, and hacky.
> 
> hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data
> into objects, making updating expensive. The change allows for hardware
> to only upload new parameters if they have been changed.
> It's insignificant for H264 and AV1, but HEVC's structures can reach 114
> megabytes of data that has to be uploaded, for a specially crafted input,
> which is enough to DDOS an ingest.
> The data is set and managed by the hwaccel, but does need to be synchronized
> between different decoding threads, which this patch performs.
> 
> Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being
> actually threadsafe, and requiring no serialization. It does work and it
> does actually make a difference, on average, it can increase performance
> by 20% for an average B-frame using HEVC stream, depending on the
> number of threads and the number of decode queues.
> While hardware decoders are fast in general, certain vendors such as AMD
> can choke up while playing 8k video, and this patch can significantly help
> increase throughput.
> 
> In context, the changes can be viewed here:
> https://github.com/cyanreg/FFmpeg/tree/vulkan
> 
> The rest of the whole patchset is either rewrites, filter code, or
> the actual hardware accel code.
> 
> The patchset will not be pushed standalone, but as part of the greater
> Vulkan patchset.
> 
> 31 patches attached.

[...]

> diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
> index a80e37e33f..5a3c51e94a 100644
> --- a/libavcodec/av1dec.c
> +++ b/libavcodec/av1dec.c
> @@ -591,6 +591,8 @@ static void av1_frame_unref(AVCodecContext *avctx, AV1Frame *f)
>      f->spatial_id = f->temporal_id = 0;
>      memset(f->skip_mode_frame_idx, 0,
>             2 * sizeof(uint8_t));
> +    memset(f->ref_order_hint, 0,
> +           7 * sizeof(uint8_t));
>      memset(&f->film_grain, 0, sizeof(f->film_grain));
>      f->coded_lossless = 0;
>  }
> @@ -633,6 +635,9 @@ static int av1_frame_ref(AVCodecContext *avctx, AV1Frame *dst, const AV1Frame *s
>      memcpy(dst->skip_mode_frame_idx,
>             src->skip_mode_frame_idx,
>             2 * sizeof(uint8_t));
> +    memcpy(dst->ref_order_hint,
> +           src->ref_order_hint,
> +           7 * sizeof(uint8_t));
>      memcpy(&dst->film_grain,
>             &src->film_grain,
>             sizeof(dst->film_grain));
> @@ -1267,6 +1272,10 @@ static int av1_decode_frame(AVCodecContext *avctx, AVFrame *frame,
>              s->cur_frame.spatial_id  = header->spatial_id;
>              s->cur_frame.temporal_id = header->temporal_id;
>  
> +            for (int i = 0; i < 7; i++)
> +                s->cur_frame.ref_order_hint[i] =
> +                s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]];

Why do you need this in cur_frame? It's not a derived value, and the 
AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks.

And i think you should be looking at 
s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint 
instead, too, which is the decoder state vs the raw values in the 
current frame header (Although they should match in theory).

> +
>              if (avctx->hwaccel && s->cur_frame.f->buf[0]) {
>                  ret = avctx->hwaccel->start_frame(avctx, unit->data,
Lynne March 15, 2023, 7:41 p.m. UTC | #2
Mar 14, 2023, 12:51 by jamrial@gmail.com:

> On 3/14/2023 3:33 AM, Lynne wrote:
>
>>  +            for (int i = 0; i < 7; i++)
>> +                s->cur_frame.ref_order_hint[i] =
>> +                s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]];
>>
>
> Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks.
>
> And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory).
>
>> +
>>  if (avctx->hwaccel && s->cur_frame.f->buf[0]) {
>>  ret = avctx->hwaccel->start_frame(avctx, unit->data,
>>

Fixed, removed the commit altogether and just used the value directly.
Thanks for the review
Anton Khirnov April 16, 2023, 7:21 p.m. UTC | #3
Quoting Lynne (2023-03-14 07:33:43)
> From 56ebb0bc2fd4b856196e6436b4c18af8e4d1a86a Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Thu, 10 Mar 2022 18:08:53 +0100
> Subject: [PATCH 05/92] h264_parser: expose idr_pic_id
> 
> Vulkan needs it.
> ---
>  libavcodec/h264_parse.h  | 1 +
>  libavcodec/h264_parser.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/h264_parse.h b/libavcodec/h264_parse.h
> index 4ee863df66..4ba0add4f2 100644
> --- a/libavcodec/h264_parse.h
> +++ b/libavcodec/h264_parse.h
> @@ -85,6 +85,7 @@ typedef struct H264POCContext {
>      int delta_poc_bottom;
>      int delta_poc[2];
>      int frame_num;
> +    int idr_pic_id;
>      int prev_poc_msb;           ///< poc_msb of the last reference pic for POC type 0
>      int prev_poc_lsb;           ///< poc_lsb of the last reference pic for POC type 0
>      int frame_num_offset;       ///< for POC type 2
> diff --git a/libavcodec/h264_parser.c b/libavcodec/h264_parser.c
> index 46134a1c48..1c330484c1 100644
> --- a/libavcodec/h264_parser.c
> +++ b/libavcodec/h264_parser.c
> @@ -432,7 +432,7 @@ static inline int parse_nal_units(AVCodecParserContext *s,
>              }
>  
>              if (nal.type == H264_NAL_IDR_SLICE)
> -                get_ue_golomb_long(&nal.gb); /* idr_pic_id */
> +                p->poc.idr_pic_id = get_ue_golomb_long(&nal.gb); /* idr_pic_id */
>              if (sps->poc_type == 0) {
>                  p->poc.poc_lsb = get_bits(&nal.gb, sps->log2_max_poc_lsb);

This change cannot possibly have any effect on any Vulkan code.

I told you this in the first review round, then we discussed it on IRC
with a detailed explanation why. And look, the same patch is here again
with no changes and no explanation. I sure hope my other comments
weren't treated the same way.
Anton Khirnov April 16, 2023, 7:26 p.m. UTC | #4
Quoting Lynne (2023-03-14 07:33:43)
> From 3ff54622533627493643c8e4a5e982fc99a7df47 Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Wed, 22 Feb 2023 22:30:36 +0100
> Subject: [PATCH 21/92] av1dec: expose ref_order_hint
> 
> Commit by Dave Airlie

???
Anton Khirnov April 16, 2023, 8:38 p.m. UTC | #5
Quoting Lynne (2023-03-14 07:33:43)
> From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Tue, 17 Jan 2023 05:01:45 +0100
> Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf
> 
> diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
> index 0aee724c0d..6559593195 100644
> --- a/libavcodec/h264dec.c
> +++ b/libavcodec/h264dec.c
> @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx)
>      H264Context *h = avctx->priv_data;
>      int i;
>  
> +    av_buffer_unref(&h->hwaccel_params_buf);
>      ff_h264_remove_all_refs(h);
>      ff_h264_free_tables(h);
>  
> @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx)
>  
>      ff_h264_flush_change(h);
>      ff_h264_sei_uninit(&h->sei);
> +    av_buffer_unref(&h->hwaccel_params_buf);
>  
>      for (i = 0; i < H264_MAX_PICTURE_COUNT; i++)
>          ff_h264_unref_picture(h, &h->DPB[i]);
> @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>              avpriv_request_sample(avctx, "data partitioning");
>              break;
>          case H264_NAL_SEI:
> +            av_buffer_unref(&h->hwaccel_params_buf);
>              if (h->setup_finished) {
>                  avpriv_request_sample(avctx, "Late SEI");
>                  break;
> @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>              break;
>          case H264_NAL_SPS: {
>              GetBitContext tmp_gb = nal->gb;
> +            av_buffer_unref(&h->hwaccel_params_buf);
>              if (avctx->hwaccel && avctx->hwaccel->decode_params) {
>                  ret = avctx->hwaccel->decode_params(avctx,
>                                                      nal->type,

Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why
specifically those two?
Anton Khirnov April 17, 2023, 9:07 a.m. UTC | #6
Quoting Anton Khirnov (2023-04-16 22:38:16)
> Quoting Lynne (2023-03-14 07:33:43)
> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001
> > From: Lynne <dev@lynne.ee>
> > Date: Tue, 17 Jan 2023 05:01:45 +0100
> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf
> > 
> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
> > index 0aee724c0d..6559593195 100644
> > --- a/libavcodec/h264dec.c
> > +++ b/libavcodec/h264dec.c
> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx)
> >      H264Context *h = avctx->priv_data;
> >      int i;
> >  
> > +    av_buffer_unref(&h->hwaccel_params_buf);
> >      ff_h264_remove_all_refs(h);
> >      ff_h264_free_tables(h);
> >  
> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx)
> >  
> >      ff_h264_flush_change(h);
> >      ff_h264_sei_uninit(&h->sei);
> > +    av_buffer_unref(&h->hwaccel_params_buf);
> >  
> >      for (i = 0; i < H264_MAX_PICTURE_COUNT; i++)
> >          ff_h264_unref_picture(h, &h->DPB[i]);
> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
> >              avpriv_request_sample(avctx, "data partitioning");
> >              break;
> >          case H264_NAL_SEI:
> > +            av_buffer_unref(&h->hwaccel_params_buf);
> >              if (h->setup_finished) {
> >                  avpriv_request_sample(avctx, "Late SEI");
> >                  break;
> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
> >              break;
> >          case H264_NAL_SPS: {
> >              GetBitContext tmp_gb = nal->gb;
> > +            av_buffer_unref(&h->hwaccel_params_buf);
> >              if (avctx->hwaccel && avctx->hwaccel->decode_params) {
> >                  ret = avctx->hwaccel->decode_params(avctx,
> >                                                      nal->type,
> 
> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why
> specifically those two?

Found the explanation in the cover letter - I assume SEI is a mistake
and should really be PPS. The points where it is reset still seem highly
Vulkan-specific, so I think it'd be better for the decoder to set a flag
on seeing new parameter sets. Vulkan could would then check and clear
the flag as needed.
Anton Khirnov April 17, 2023, 9:25 a.m. UTC | #7
Quoting Lynne (2023-03-14 07:33:43)
> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Thu, 15 Dec 2022 01:06:52 +0100
> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for
>  threadsafe hwaccels
> 
> Vulkan is fully threadsafe and stateless, so we can benefit from this.
> ---
>  libavcodec/hwconfig.h      | 1 +
>  libavcodec/pthread_frame.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
> index 721424912c..e6b78f0160 100644
> --- a/libavcodec/hwconfig.h
> +++ b/libavcodec/hwconfig.h
> @@ -24,6 +24,7 @@
>  
>  
>  #define HWACCEL_CAP_ASYNC_SAFE      (1 << 0)
> +#define HWACCEL_CAP_THREAD_SAFE     (1 << 1)
>  
>  
>  typedef struct AVCodecHWConfigInternal {
> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> index 74864e19c5..c096287233 100644
> --- a/libavcodec/pthread_frame.c
> +++ b/libavcodec/pthread_frame.c
> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
>  
>          /* if the previous thread uses hwaccel then we take the lock to ensure
>           * the threads don't run concurrently */
> -        if (avctx->hwaccel) {
> +        if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) {

A major problem here is that hwaccel_priv_data exists in only a single
instance shared across all the worker threads. If you want frame-threaded
hwaccels, you'll need to add some mechanism for synchronizing it between
threads.

I really wonder how does your code survive all the races involved.
Anton Khirnov April 17, 2023, 9:46 a.m. UTC | #8
Quoting Lynne (2023-03-14 07:33:43)
> From e37330cf6a1f768c733ca6d0b23b9e5847a5fe4f Mon Sep 17 00:00:00 2001
> From: Lynne <dev@lynne.ee>
> Date: Thu, 10 Mar 2022 18:03:05 +0100
> Subject: [PATCH 25/92] avcodec: add AVHWAccel.free_frame_priv callback
> diff --git a/libavcodec/decode.c b/libavcodec/decode.c
> index d1ba7f167f..c4630ce275 100644
> --- a/libavcodec/decode.c
> +++ b/libavcodec/decode.c
> @@ -1685,3 +1685,22 @@ int ff_copy_palette(void *dst, const AVPacket *src, void *logctx)
>      }
>      return 0;
>  }
> +
> +AVBufferRef *ff_alloc_hwaccel_frame_priv_data(AVCodecContext *avctx,

Sane-endian naming please. ff_hwaccel_frame_priv_alloc().

> +                                              const AVHWAccel *hwaccel)
> +{
> +    AVBufferRef *ref;
> +    uint8_t *data = av_mallocz(hwaccel->frame_priv_data_size);
> +    if (!data)
> +        return NULL;
> +
> +    ref = av_buffer_create(data, hwaccel->frame_priv_data_size,
> +                           (void (*)(void *, uint8_t *))hwaccel->free_frame_priv,

I believe the cast is UB and you should convert the arguments in the
function itself, or add a trivial wrapper here.
Lynne April 19, 2023, 2:39 p.m. UTC | #9
Apr 17, 2023, 11:25 by anton@khirnov.net:

> Quoting Lynne (2023-03-14 07:33:43)
>
>> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001
>> From: Lynne <dev@lynne.ee>
>> Date: Thu, 15 Dec 2022 01:06:52 +0100
>> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for
>>  threadsafe hwaccels
>>
>> Vulkan is fully threadsafe and stateless, so we can benefit from this.
>> ---
>>  libavcodec/hwconfig.h      | 1 +
>>  libavcodec/pthread_frame.c | 2 +-
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
>> index 721424912c..e6b78f0160 100644
>> --- a/libavcodec/hwconfig.h
>> +++ b/libavcodec/hwconfig.h
>> @@ -24,6 +24,7 @@
>>  
>>  
>>  #define HWACCEL_CAP_ASYNC_SAFE      (1 << 0)
>> +#define HWACCEL_CAP_THREAD_SAFE     (1 << 1)
>>  
>>  
>>  typedef struct AVCodecHWConfigInternal {
>> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
>> index 74864e19c5..c096287233 100644
>> --- a/libavcodec/pthread_frame.c
>> +++ b/libavcodec/pthread_frame.c
>> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
>>  
>>  /* if the previous thread uses hwaccel then we take the lock to ensure
>>  * the threads don't run concurrently */
>> -        if (avctx->hwaccel) {
>> +        if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) {
>>
>
> A major problem here is that hwaccel_priv_data exists in only a single
> instance shared across all the worker threads. If you want frame-threaded
> hwaccels, you'll need to add some mechanism for synchronizing it between
> threads.
>
> I really wonder how does your code survive all the races involved.
>

It works because hwaccel_priv_data is read-only. No need to synchronize
anything, the contexts Vulkan creates are defined as driver-synchronized,
and all the info a frame needs is provided when decoding it, rather than
kept as a state.
So I don't see a problem here.
Lynne April 19, 2023, 2:47 p.m. UTC | #10
Apr 17, 2023, 11:07 by anton@khirnov.net:

> Quoting Anton Khirnov (2023-04-16 22:38:16)
>
>> Quoting Lynne (2023-03-14 07:33:43)
>> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001
>> > From: Lynne <dev@lynne.ee>
>> > Date: Tue, 17 Jan 2023 05:01:45 +0100
>> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf
>> > 
>> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
>> > index 0aee724c0d..6559593195 100644
>> > --- a/libavcodec/h264dec.c
>> > +++ b/libavcodec/h264dec.c
>> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx)
>> >      H264Context *h = avctx->priv_data;
>> >      int i;
>> > 
>> > +    av_buffer_unref(&h->hwaccel_params_buf);
>> >      ff_h264_remove_all_refs(h);
>> >      ff_h264_free_tables(h);
>> > 
>> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx)
>> > 
>> >      ff_h264_flush_change(h);
>> >      ff_h264_sei_uninit(&h->sei);
>> > +    av_buffer_unref(&h->hwaccel_params_buf);
>> > 
>> >      for (i = 0; i < H264_MAX_PICTURE_COUNT; i++)
>> >          ff_h264_unref_picture(h, &h->DPB[i]);
>> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>> >              avpriv_request_sample(avctx, "data partitioning");
>> >              break;
>> >          case H264_NAL_SEI:
>> > +            av_buffer_unref(&h->hwaccel_params_buf);
>> >              if (h->setup_finished) {
>> >                  avpriv_request_sample(avctx, "Late SEI");
>> >                  break;
>> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>> >              break;
>> >          case H264_NAL_SPS: {
>> >              GetBitContext tmp_gb = nal->gb;
>> > +            av_buffer_unref(&h->hwaccel_params_buf);
>> >              if (avctx->hwaccel && avctx->hwaccel->decode_params) {
>> >                  ret = avctx->hwaccel->decode_params(avctx,
>> >                                                      nal->type,
>>
>> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why
>> specifically those two?
>>
>
> Found the explanation in the cover letter - I assume SEI is a mistake
> and should really be PPS. The points where it is reset still seem highly
> Vulkan-specific, so I think it'd be better for the decoder to set a flag
> on seeing new parameter sets. Vulkan could would then check and clear
> the flag as needed.
>

It was a typo, changed to remove it on SPS and PPS.
We still have to store the compiled bufferref somewhere, and synchronize
it between frames, and having it in the frame context itself solves
those issues.
Lynne April 19, 2023, 2:49 p.m. UTC | #11
Apr 17, 2023, 11:46 by anton@khirnov.net:

> Quoting Lynne (2023-03-14 07:33:43)
>
>> From e37330cf6a1f768c733ca6d0b23b9e5847a5fe4f Mon Sep 17 00:00:00 2001
>> From: Lynne <dev@lynne.ee>
>> Date: Thu, 10 Mar 2022 18:03:05 +0100
>> Subject: [PATCH 25/92] avcodec: add AVHWAccel.free_frame_priv callback
>> diff --git a/libavcodec/decode.c b/libavcodec/decode.c
>> index d1ba7f167f..c4630ce275 100644
>> --- a/libavcodec/decode.c
>> +++ b/libavcodec/decode.c
>> @@ -1685,3 +1685,22 @@ int ff_copy_palette(void *dst, const AVPacket *src, void *logctx)
>>  }
>>  return 0;
>>  }
>> +
>> +AVBufferRef *ff_alloc_hwaccel_frame_priv_data(AVCodecContext *avctx,
>>
>
> Sane-endian naming please. ff_hwaccel_frame_priv_alloc().
>

Done


>> +                                              const AVHWAccel *hwaccel)
>> +{
>> +    AVBufferRef *ref;
>> +    uint8_t *data = av_mallocz(hwaccel->frame_priv_data_size);
>> +    if (!data)
>> +        return NULL;
>> +
>> +    ref = av_buffer_create(data, hwaccel->frame_priv_data_size,
>> +                           (void (*)(void *, uint8_t *))hwaccel->free_frame_priv,
>>
>
> I believe the cast is UB and you should convert the arguments in the
> function itself, or add a trivial wrapper here.
>

I converted the arguments of the function.
Anton Khirnov April 22, 2023, 12:38 p.m. UTC | #12
Quoting Lynne (2023-04-19 16:47:11)
> Apr 17, 2023, 11:07 by anton@khirnov.net:
> 
> > Quoting Anton Khirnov (2023-04-16 22:38:16)
> >
> >> Quoting Lynne (2023-03-14 07:33:43)
> >> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001
> >> > From: Lynne <dev@lynne.ee>
> >> > Date: Tue, 17 Jan 2023 05:01:45 +0100
> >> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf
> >> > 
> >> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
> >> > index 0aee724c0d..6559593195 100644
> >> > --- a/libavcodec/h264dec.c
> >> > +++ b/libavcodec/h264dec.c
> >> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx)
> >> >      H264Context *h = avctx->priv_data;
> >> >      int i;
> >> > 
> >> > +    av_buffer_unref(&h->hwaccel_params_buf);
> >> >      ff_h264_remove_all_refs(h);
> >> >      ff_h264_free_tables(h);
> >> > 
> >> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx)
> >> > 
> >> >      ff_h264_flush_change(h);
> >> >      ff_h264_sei_uninit(&h->sei);
> >> > +    av_buffer_unref(&h->hwaccel_params_buf);
> >> > 
> >> >      for (i = 0; i < H264_MAX_PICTURE_COUNT; i++)
> >> >          ff_h264_unref_picture(h, &h->DPB[i]);
> >> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
> >> >              avpriv_request_sample(avctx, "data partitioning");
> >> >              break;
> >> >          case H264_NAL_SEI:
> >> > +            av_buffer_unref(&h->hwaccel_params_buf);
> >> >              if (h->setup_finished) {
> >> >                  avpriv_request_sample(avctx, "Late SEI");
> >> >                  break;
> >> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
> >> >              break;
> >> >          case H264_NAL_SPS: {
> >> >              GetBitContext tmp_gb = nal->gb;
> >> > +            av_buffer_unref(&h->hwaccel_params_buf);
> >> >              if (avctx->hwaccel && avctx->hwaccel->decode_params) {
> >> >                  ret = avctx->hwaccel->decode_params(avctx,
> >> >                                                      nal->type,
> >>
> >> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why
> >> specifically those two?
> >>
> >
> > Found the explanation in the cover letter - I assume SEI is a mistake
> > and should really be PPS. The points where it is reset still seem highly
> > Vulkan-specific, so I think it'd be better for the decoder to set a flag
> > on seeing new parameter sets. Vulkan could would then check and clear
> > the flag as needed.
> >
> 
> It was a typo, changed to remove it on SPS and PPS.
> We still have to store the compiled bufferref somewhere, and synchronize
> it between frames, and having it in the frame context itself solves
> those issues.

Except the way it is handled makes no sense for anything other than
vulkan.

Having per-thread hwaccel contexts with a callback for propagating the
data would solve this more cleanly.
Anton Khirnov April 22, 2023, 12:41 p.m. UTC | #13
Quoting Lynne (2023-04-19 16:39:28)
> Apr 17, 2023, 11:25 by anton@khirnov.net:
> 
> > Quoting Lynne (2023-03-14 07:33:43)
> >
> >> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001
> >> From: Lynne <dev@lynne.ee>
> >> Date: Thu, 15 Dec 2022 01:06:52 +0100
> >> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for
> >>  threadsafe hwaccels
> >>
> >> Vulkan is fully threadsafe and stateless, so we can benefit from this.
> >> ---
> >>  libavcodec/hwconfig.h      | 1 +
> >>  libavcodec/pthread_frame.c | 2 +-
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
> >> index 721424912c..e6b78f0160 100644
> >> --- a/libavcodec/hwconfig.h
> >> +++ b/libavcodec/hwconfig.h
> >> @@ -24,6 +24,7 @@
> >>  
> >>  
> >>  #define HWACCEL_CAP_ASYNC_SAFE      (1 << 0)
> >> +#define HWACCEL_CAP_THREAD_SAFE     (1 << 1)
> >>  
> >>  
> >>  typedef struct AVCodecHWConfigInternal {
> >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> >> index 74864e19c5..c096287233 100644
> >> --- a/libavcodec/pthread_frame.c
> >> +++ b/libavcodec/pthread_frame.c
> >> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
> >>  
> >>  /* if the previous thread uses hwaccel then we take the lock to ensure
> >>  * the threads don't run concurrently */
> >> -        if (avctx->hwaccel) {
> >> +        if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) {
> >>
> >
> > A major problem here is that hwaccel_priv_data exists in only a single
> > instance shared across all the worker threads. If you want frame-threaded
> > hwaccels, you'll need to add some mechanism for synchronizing it between
> > threads.
> >
> > I really wonder how does your code survive all the races involved.
> >
> 
> It works because hwaccel_priv_data is read-only. No need to synchronize
> anything, the contexts Vulkan creates are defined as driver-synchronized,
> and all the info a frame needs is provided when decoding it, rather than
> kept as a state.
> So I don't see a problem here.

From a quick glance, at least vk_av1_free_frame_priv() modifies this
context, which seems highly unsafe.

This really seems like a disaster waiting to happen, there should be
some stronger guarantees that nobody will modify this context when they
shouldn't.
Lynne April 24, 2023, 8:26 a.m. UTC | #14
Apr 22, 2023, 14:42 by anton@khirnov.net:

> Quoting Lynne (2023-04-19 16:39:28)
>
>> Apr 17, 2023, 11:25 by anton@khirnov.net:
>>
>> > Quoting Lynne (2023-03-14 07:33:43)
>> >
>> >> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001
>> >> From: Lynne <dev@lynne.ee>
>> >> Date: Thu, 15 Dec 2022 01:06:52 +0100
>> >> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for
>> >>  threadsafe hwaccels
>> >>
>> >> Vulkan is fully threadsafe and stateless, so we can benefit from this.
>> >> ---
>> >>  libavcodec/hwconfig.h      | 1 +
>> >>  libavcodec/pthread_frame.c | 2 +-
>> >>  2 files changed, 2 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
>> >> index 721424912c..e6b78f0160 100644
>> >> --- a/libavcodec/hwconfig.h
>> >> +++ b/libavcodec/hwconfig.h
>> >> @@ -24,6 +24,7 @@
>> >> 
>> >> 
>> >>  #define HWACCEL_CAP_ASYNC_SAFE      (1 << 0)
>> >> +#define HWACCEL_CAP_THREAD_SAFE     (1 << 1)
>> >> 
>> >> 
>> >>  typedef struct AVCodecHWConfigInternal {
>> >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
>> >> index 74864e19c5..c096287233 100644
>> >> --- a/libavcodec/pthread_frame.c
>> >> +++ b/libavcodec/pthread_frame.c
>> >> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
>> >> 
>> >>  /* if the previous thread uses hwaccel then we take the lock to ensure
>> >>  * the threads don't run concurrently */
>> >> -        if (avctx->hwaccel) {
>> >> +        if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) {
>> >>
>> >
>> > A major problem here is that hwaccel_priv_data exists in only a single
>> > instance shared across all the worker threads. If you want frame-threaded
>> > hwaccels, you'll need to add some mechanism for synchronizing it between
>> > threads.
>> >
>> > I really wonder how does your code survive all the races involved.
>> >
>>
>> It works because hwaccel_priv_data is read-only. No need to synchronize
>> anything, the contexts Vulkan creates are defined as driver-synchronized,
>> and all the info a frame needs is provided when decoding it, rather than
>> kept as a state.
>> So I don't see a problem here.
>>
>
> From a quick glance, at least vk_av1_free_frame_priv() modifies this
> context, which seems highly unsafe.
>

Our AV1 decoder isn't threaded yet. I've removed the threadsafe flag from the
hwaccel, Dave is currently looking into whether it's possible to remove the
frame id.


> This really seems like a disaster waiting to happen, there should be
> some stronger guarantees that nobody will modify this context when they
> shouldn't.
>

It's an opt-in flag, that's more than enough.
Lynne April 24, 2023, 8:27 a.m. UTC | #15
Apr 22, 2023, 14:38 by anton@khirnov.net:

> Quoting Lynne (2023-04-19 16:47:11)
>
>> Apr 17, 2023, 11:07 by anton@khirnov.net:
>>
>> > Quoting Anton Khirnov (2023-04-16 22:38:16)
>> >
>> >> Quoting Lynne (2023-03-14 07:33:43)
>> >> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001
>> >> > From: Lynne <dev@lynne.ee>
>> >> > Date: Tue, 17 Jan 2023 05:01:45 +0100
>> >> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf
>> >> > 
>> >> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
>> >> > index 0aee724c0d..6559593195 100644
>> >> > --- a/libavcodec/h264dec.c
>> >> > +++ b/libavcodec/h264dec.c
>> >> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx)
>> >> >      H264Context *h = avctx->priv_data;
>> >> >      int i;
>> >> > 
>> >> > +    av_buffer_unref(&h->hwaccel_params_buf);
>> >> >      ff_h264_remove_all_refs(h);
>> >> >      ff_h264_free_tables(h);
>> >> > 
>> >> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx)
>> >> > 
>> >> >      ff_h264_flush_change(h);
>> >> >      ff_h264_sei_uninit(&h->sei);
>> >> > +    av_buffer_unref(&h->hwaccel_params_buf);
>> >> > 
>> >> >      for (i = 0; i < H264_MAX_PICTURE_COUNT; i++)
>> >> >          ff_h264_unref_picture(h, &h->DPB[i]);
>> >> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>> >> >              avpriv_request_sample(avctx, "data partitioning");
>> >> >              break;
>> >> >          case H264_NAL_SEI:
>> >> > +            av_buffer_unref(&h->hwaccel_params_buf);
>> >> >              if (h->setup_finished) {
>> >> >                  avpriv_request_sample(avctx, "Late SEI");
>> >> >                  break;
>> >> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size)
>> >> >              break;
>> >> >          case H264_NAL_SPS: {
>> >> >              GetBitContext tmp_gb = nal->gb;
>> >> > +            av_buffer_unref(&h->hwaccel_params_buf);
>> >> >              if (avctx->hwaccel && avctx->hwaccel->decode_params) {
>> >> >                  ret = avctx->hwaccel->decode_params(avctx,
>> >> >                                                      nal->type,
>> >>
>> >> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why
>> >> specifically those two?
>> >>
>> >
>> > Found the explanation in the cover letter - I assume SEI is a mistake
>> > and should really be PPS. The points where it is reset still seem highly
>> > Vulkan-specific, so I think it'd be better for the decoder to set a flag
>> > on seeing new parameter sets. Vulkan could would then check and clear
>> > the flag as needed.
>> >
>>
>> It was a typo, changed to remove it on SPS and PPS.
>> We still have to store the compiled bufferref somewhere, and synchronize
>> it between frames, and having it in the frame context itself solves
>> those issues.
>>
>
> Except the way it is handled makes no sense for anything other than
> vulkan.
>
> Having per-thread hwaccel contexts with a callback for propagating the
> data would solve this more cleanly.
>

How would that look?
diff mbox series

Patch

From 26136684812c3e94a35dea25cb5edfe32601a518 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Sat, 25 Feb 2023 09:36:58 +0100
Subject: [PATCH 31/92] lsws: add in/out support for the new 12-bit 2-plane 422
 and 444 pixfmts

---
 libswscale/input.c               |  8 ++++++++
 libswscale/utils.c               |  4 ++++
 tests/ref/fate/sws-pixdesc-query | 26 ++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/libswscale/input.c b/libswscale/input.c
index d5676062a2..41795c636e 100644
--- a/libswscale/input.c
+++ b/libswscale/input.c
@@ -1452,9 +1452,13 @@  av_cold void ff_sws_init_input_funcs(SwsContext *c)
         c->chrToYV12 = p010BEToUV_c;
         break;
     case AV_PIX_FMT_P012LE:
+    case AV_PIX_FMT_P212LE:
+    case AV_PIX_FMT_P412LE:
         c->chrToYV12 = p012LEToUV_c;
         break;
     case AV_PIX_FMT_P012BE:
+    case AV_PIX_FMT_P212BE:
+    case AV_PIX_FMT_P412BE:
         c->chrToYV12 = p012BEToUV_c;
         break;
     case AV_PIX_FMT_P016LE:
@@ -1944,9 +1948,13 @@  av_cold void ff_sws_init_input_funcs(SwsContext *c)
         c->lumToYV12 = p010BEToY_c;
         break;
     case AV_PIX_FMT_P012LE:
+    case AV_PIX_FMT_P212LE:
+    case AV_PIX_FMT_P412LE:
         c->lumToYV12 = p012LEToY_c;
         break;
     case AV_PIX_FMT_P012BE:
+    case AV_PIX_FMT_P212BE:
+    case AV_PIX_FMT_P412BE:
         c->lumToYV12 = p012BEToY_c;
         break;
     case AV_PIX_FMT_GRAYF32LE:
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 925c536bf1..a3a7a40750 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -248,8 +248,12 @@  static const FormatEntry format_entries[] = {
     [AV_PIX_FMT_X2BGR10LE]   = { 1, 1 },
     [AV_PIX_FMT_P210BE]      = { 1, 1 },
     [AV_PIX_FMT_P210LE]      = { 1, 1 },
+    [AV_PIX_FMT_P212BE]      = { 1, 1 },
+    [AV_PIX_FMT_P212LE]      = { 1, 1 },
     [AV_PIX_FMT_P410BE]      = { 1, 1 },
     [AV_PIX_FMT_P410LE]      = { 1, 1 },
+    [AV_PIX_FMT_P412BE]      = { 1, 1 },
+    [AV_PIX_FMT_P412LE]      = { 1, 1 },
     [AV_PIX_FMT_P216BE]      = { 1, 1 },
     [AV_PIX_FMT_P216LE]      = { 1, 1 },
     [AV_PIX_FMT_P416BE]      = { 1, 1 },
diff --git a/tests/ref/fate/sws-pixdesc-query b/tests/ref/fate/sws-pixdesc-query
index 14156a383c..fd7f2aefc0 100644
--- a/tests/ref/fate/sws-pixdesc-query
+++ b/tests/ref/fate/sws-pixdesc-query
@@ -67,8 +67,12 @@  isNBPS:
   p012le
   p210be
   p210le
+  p212be
+  p212le
   p410be
   p410le
+  p412be
+  p412le
   x2bgr10be
   x2bgr10le
   x2rgb10be
@@ -160,8 +164,10 @@  isBE:
   p012be
   p016be
   p210be
+  p212be
   p216be
   p410be
+  p412be
   p416be
   rgb444be
   rgb48be
@@ -226,10 +232,14 @@  isYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   uyvy422
@@ -338,10 +348,14 @@  isPlanarYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   yuv410p
@@ -431,10 +445,14 @@  isSemiPlanarYUV:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
 
@@ -853,10 +871,14 @@  Planar:
   p016le
   p210be
   p210le
+  p212be
+  p212le
   p216be
   p216le
   p410be
   p410le
+  p412be
+  p412le
   p416be
   p416le
   yuv410p
@@ -1029,8 +1051,12 @@  DataInHighBits:
   p012le
   p210be
   p210le
+  p212be
+  p212le
   p410be
   p410le
+  p412be
+  p412le
   xv36be
   xv36le
   xyz12be
-- 
2.39.2