Message ID | NQTold9--3-9@lynne.ee |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 3/14/2023 3:33 AM, Lynne wrote: > The attached patchset is all the common code changes that my Vulkan patchset needs. > > In total lines of code, this part has 425 additions and 131 deletions. > Most of that is additions to HEVC parsing. Excluding them, the patchset is > 200 lines of code added, which is manageable. > > Apart from the parser changes, the following other changes have been > made to the API: > > AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView > objects to wrap VkImage objects, which we need to free once they're no longer > in use. Every other API uses the direct objects in decoding, but with Vulkan, > they have to be represented by other objects. > We also use it to free the slice offsets buffer. > > AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being > stateless in theory. The decoder has to be notified of flushes in order to reset > decoding slots and other data it needs, such as motion vectors and reference > lists for AV1. Otherwise, inferring whether a flush has happened can be codec > dependent, and hacky. > > hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data > into objects, making updating expensive. The change allows for hardware > to only upload new parameters if they have been changed. > It's insignificant for H264 and AV1, but HEVC's structures can reach 114 > megabytes of data that has to be uploaded, for a specially crafted input, > which is enough to DDOS an ingest. > The data is set and managed by the hwaccel, but does need to be synchronized > between different decoding threads, which this patch performs. > > Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being > actually threadsafe, and requiring no serialization. It does work and it > does actually make a difference, on average, it can increase performance > by 20% for an average B-frame using HEVC stream, depending on the > number of threads and the number of decode queues. > While hardware decoders are fast in general, certain vendors such as AMD > can choke up while playing 8k video, and this patch can significantly help > increase throughput. > > In context, the changes can be viewed here: > https://github.com/cyanreg/FFmpeg/tree/vulkan > > The rest of the whole patchset is either rewrites, filter code, or > the actual hardware accel code. > > The patchset will not be pushed standalone, but as part of the greater > Vulkan patchset. > > 31 patches attached. [...] > diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c > index a80e37e33f..5a3c51e94a 100644 > --- a/libavcodec/av1dec.c > +++ b/libavcodec/av1dec.c > @@ -591,6 +591,8 @@ static void av1_frame_unref(AVCodecContext *avctx, AV1Frame *f) > f->spatial_id = f->temporal_id = 0; > memset(f->skip_mode_frame_idx, 0, > 2 * sizeof(uint8_t)); > + memset(f->ref_order_hint, 0, > + 7 * sizeof(uint8_t)); > memset(&f->film_grain, 0, sizeof(f->film_grain)); > f->coded_lossless = 0; > } > @@ -633,6 +635,9 @@ static int av1_frame_ref(AVCodecContext *avctx, AV1Frame *dst, const AV1Frame *s > memcpy(dst->skip_mode_frame_idx, > src->skip_mode_frame_idx, > 2 * sizeof(uint8_t)); > + memcpy(dst->ref_order_hint, > + src->ref_order_hint, > + 7 * sizeof(uint8_t)); > memcpy(&dst->film_grain, > &src->film_grain, > sizeof(dst->film_grain)); > @@ -1267,6 +1272,10 @@ static int av1_decode_frame(AVCodecContext *avctx, AVFrame *frame, > s->cur_frame.spatial_id = header->spatial_id; > s->cur_frame.temporal_id = header->temporal_id; > > + for (int i = 0; i < 7; i++) > + s->cur_frame.ref_order_hint[i] = > + s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]]; Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks. And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory). > + > if (avctx->hwaccel && s->cur_frame.f->buf[0]) { > ret = avctx->hwaccel->start_frame(avctx, unit->data,
Mar 14, 2023, 12:51 by jamrial@gmail.com: > On 3/14/2023 3:33 AM, Lynne wrote: > >> + for (int i = 0; i < 7; i++) >> + s->cur_frame.ref_order_hint[i] = >> + s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]]; >> > > Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks. > > And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory). > >> + >> if (avctx->hwaccel && s->cur_frame.f->buf[0]) { >> ret = avctx->hwaccel->start_frame(avctx, unit->data, >> Fixed, removed the commit altogether and just used the value directly. Thanks for the review
Quoting Lynne (2023-03-14 07:33:43) > From 56ebb0bc2fd4b856196e6436b4c18af8e4d1a86a Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Thu, 10 Mar 2022 18:08:53 +0100 > Subject: [PATCH 05/92] h264_parser: expose idr_pic_id > > Vulkan needs it. > --- > libavcodec/h264_parse.h | 1 + > libavcodec/h264_parser.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/h264_parse.h b/libavcodec/h264_parse.h > index 4ee863df66..4ba0add4f2 100644 > --- a/libavcodec/h264_parse.h > +++ b/libavcodec/h264_parse.h > @@ -85,6 +85,7 @@ typedef struct H264POCContext { > int delta_poc_bottom; > int delta_poc[2]; > int frame_num; > + int idr_pic_id; > int prev_poc_msb; ///< poc_msb of the last reference pic for POC type 0 > int prev_poc_lsb; ///< poc_lsb of the last reference pic for POC type 0 > int frame_num_offset; ///< for POC type 2 > diff --git a/libavcodec/h264_parser.c b/libavcodec/h264_parser.c > index 46134a1c48..1c330484c1 100644 > --- a/libavcodec/h264_parser.c > +++ b/libavcodec/h264_parser.c > @@ -432,7 +432,7 @@ static inline int parse_nal_units(AVCodecParserContext *s, > } > > if (nal.type == H264_NAL_IDR_SLICE) > - get_ue_golomb_long(&nal.gb); /* idr_pic_id */ > + p->poc.idr_pic_id = get_ue_golomb_long(&nal.gb); /* idr_pic_id */ > if (sps->poc_type == 0) { > p->poc.poc_lsb = get_bits(&nal.gb, sps->log2_max_poc_lsb); This change cannot possibly have any effect on any Vulkan code. I told you this in the first review round, then we discussed it on IRC with a detailed explanation why. And look, the same patch is here again with no changes and no explanation. I sure hope my other comments weren't treated the same way.
Quoting Lynne (2023-03-14 07:33:43) > From 3ff54622533627493643c8e4a5e982fc99a7df47 Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Wed, 22 Feb 2023 22:30:36 +0100 > Subject: [PATCH 21/92] av1dec: expose ref_order_hint > > Commit by Dave Airlie ???
Quoting Lynne (2023-03-14 07:33:43) > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Tue, 17 Jan 2023 05:01:45 +0100 > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf > > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c > index 0aee724c0d..6559593195 100644 > --- a/libavcodec/h264dec.c > +++ b/libavcodec/h264dec.c > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx) > H264Context *h = avctx->priv_data; > int i; > > + av_buffer_unref(&h->hwaccel_params_buf); > ff_h264_remove_all_refs(h); > ff_h264_free_tables(h); > > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx) > > ff_h264_flush_change(h); > ff_h264_sei_uninit(&h->sei); > + av_buffer_unref(&h->hwaccel_params_buf); > > for (i = 0; i < H264_MAX_PICTURE_COUNT; i++) > ff_h264_unref_picture(h, &h->DPB[i]); > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > avpriv_request_sample(avctx, "data partitioning"); > break; > case H264_NAL_SEI: > + av_buffer_unref(&h->hwaccel_params_buf); > if (h->setup_finished) { > avpriv_request_sample(avctx, "Late SEI"); > break; > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > break; > case H264_NAL_SPS: { > GetBitContext tmp_gb = nal->gb; > + av_buffer_unref(&h->hwaccel_params_buf); > if (avctx->hwaccel && avctx->hwaccel->decode_params) { > ret = avctx->hwaccel->decode_params(avctx, > nal->type, Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why specifically those two?
Quoting Anton Khirnov (2023-04-16 22:38:16) > Quoting Lynne (2023-03-14 07:33:43) > > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001 > > From: Lynne <dev@lynne.ee> > > Date: Tue, 17 Jan 2023 05:01:45 +0100 > > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf > > > > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c > > index 0aee724c0d..6559593195 100644 > > --- a/libavcodec/h264dec.c > > +++ b/libavcodec/h264dec.c > > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx) > > H264Context *h = avctx->priv_data; > > int i; > > > > + av_buffer_unref(&h->hwaccel_params_buf); > > ff_h264_remove_all_refs(h); > > ff_h264_free_tables(h); > > > > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx) > > > > ff_h264_flush_change(h); > > ff_h264_sei_uninit(&h->sei); > > + av_buffer_unref(&h->hwaccel_params_buf); > > > > for (i = 0; i < H264_MAX_PICTURE_COUNT; i++) > > ff_h264_unref_picture(h, &h->DPB[i]); > > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > > avpriv_request_sample(avctx, "data partitioning"); > > break; > > case H264_NAL_SEI: > > + av_buffer_unref(&h->hwaccel_params_buf); > > if (h->setup_finished) { > > avpriv_request_sample(avctx, "Late SEI"); > > break; > > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > > break; > > case H264_NAL_SPS: { > > GetBitContext tmp_gb = nal->gb; > > + av_buffer_unref(&h->hwaccel_params_buf); > > if (avctx->hwaccel && avctx->hwaccel->decode_params) { > > ret = avctx->hwaccel->decode_params(avctx, > > nal->type, > > Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why > specifically those two? Found the explanation in the cover letter - I assume SEI is a mistake and should really be PPS. The points where it is reset still seem highly Vulkan-specific, so I think it'd be better for the decoder to set a flag on seeing new parameter sets. Vulkan could would then check and clear the flag as needed.
Quoting Lynne (2023-03-14 07:33:43) > From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Thu, 15 Dec 2022 01:06:52 +0100 > Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for > threadsafe hwaccels > > Vulkan is fully threadsafe and stateless, so we can benefit from this. > --- > libavcodec/hwconfig.h | 1 + > libavcodec/pthread_frame.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h > index 721424912c..e6b78f0160 100644 > --- a/libavcodec/hwconfig.h > +++ b/libavcodec/hwconfig.h > @@ -24,6 +24,7 @@ > > > #define HWACCEL_CAP_ASYNC_SAFE (1 << 0) > +#define HWACCEL_CAP_THREAD_SAFE (1 << 1) > > > typedef struct AVCodecHWConfigInternal { > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > index 74864e19c5..c096287233 100644 > --- a/libavcodec/pthread_frame.c > +++ b/libavcodec/pthread_frame.c > @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg) > > /* if the previous thread uses hwaccel then we take the lock to ensure > * the threads don't run concurrently */ > - if (avctx->hwaccel) { > + if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) { A major problem here is that hwaccel_priv_data exists in only a single instance shared across all the worker threads. If you want frame-threaded hwaccels, you'll need to add some mechanism for synchronizing it between threads. I really wonder how does your code survive all the races involved.
Quoting Lynne (2023-03-14 07:33:43) > From e37330cf6a1f768c733ca6d0b23b9e5847a5fe4f Mon Sep 17 00:00:00 2001 > From: Lynne <dev@lynne.ee> > Date: Thu, 10 Mar 2022 18:03:05 +0100 > Subject: [PATCH 25/92] avcodec: add AVHWAccel.free_frame_priv callback > diff --git a/libavcodec/decode.c b/libavcodec/decode.c > index d1ba7f167f..c4630ce275 100644 > --- a/libavcodec/decode.c > +++ b/libavcodec/decode.c > @@ -1685,3 +1685,22 @@ int ff_copy_palette(void *dst, const AVPacket *src, void *logctx) > } > return 0; > } > + > +AVBufferRef *ff_alloc_hwaccel_frame_priv_data(AVCodecContext *avctx, Sane-endian naming please. ff_hwaccel_frame_priv_alloc(). > + const AVHWAccel *hwaccel) > +{ > + AVBufferRef *ref; > + uint8_t *data = av_mallocz(hwaccel->frame_priv_data_size); > + if (!data) > + return NULL; > + > + ref = av_buffer_create(data, hwaccel->frame_priv_data_size, > + (void (*)(void *, uint8_t *))hwaccel->free_frame_priv, I believe the cast is UB and you should convert the arguments in the function itself, or add a trivial wrapper here.
Apr 17, 2023, 11:25 by anton@khirnov.net: > Quoting Lynne (2023-03-14 07:33:43) > >> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001 >> From: Lynne <dev@lynne.ee> >> Date: Thu, 15 Dec 2022 01:06:52 +0100 >> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for >> threadsafe hwaccels >> >> Vulkan is fully threadsafe and stateless, so we can benefit from this. >> --- >> libavcodec/hwconfig.h | 1 + >> libavcodec/pthread_frame.c | 2 +- >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h >> index 721424912c..e6b78f0160 100644 >> --- a/libavcodec/hwconfig.h >> +++ b/libavcodec/hwconfig.h >> @@ -24,6 +24,7 @@ >> >> >> #define HWACCEL_CAP_ASYNC_SAFE (1 << 0) >> +#define HWACCEL_CAP_THREAD_SAFE (1 << 1) >> >> >> typedef struct AVCodecHWConfigInternal { >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c >> index 74864e19c5..c096287233 100644 >> --- a/libavcodec/pthread_frame.c >> +++ b/libavcodec/pthread_frame.c >> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg) >> >> /* if the previous thread uses hwaccel then we take the lock to ensure >> * the threads don't run concurrently */ >> - if (avctx->hwaccel) { >> + if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) { >> > > A major problem here is that hwaccel_priv_data exists in only a single > instance shared across all the worker threads. If you want frame-threaded > hwaccels, you'll need to add some mechanism for synchronizing it between > threads. > > I really wonder how does your code survive all the races involved. > It works because hwaccel_priv_data is read-only. No need to synchronize anything, the contexts Vulkan creates are defined as driver-synchronized, and all the info a frame needs is provided when decoding it, rather than kept as a state. So I don't see a problem here.
Apr 17, 2023, 11:07 by anton@khirnov.net: > Quoting Anton Khirnov (2023-04-16 22:38:16) > >> Quoting Lynne (2023-03-14 07:33:43) >> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001 >> > From: Lynne <dev@lynne.ee> >> > Date: Tue, 17 Jan 2023 05:01:45 +0100 >> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf >> > >> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c >> > index 0aee724c0d..6559593195 100644 >> > --- a/libavcodec/h264dec.c >> > +++ b/libavcodec/h264dec.c >> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx) >> > H264Context *h = avctx->priv_data; >> > int i; >> > >> > + av_buffer_unref(&h->hwaccel_params_buf); >> > ff_h264_remove_all_refs(h); >> > ff_h264_free_tables(h); >> > >> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx) >> > >> > ff_h264_flush_change(h); >> > ff_h264_sei_uninit(&h->sei); >> > + av_buffer_unref(&h->hwaccel_params_buf); >> > >> > for (i = 0; i < H264_MAX_PICTURE_COUNT; i++) >> > ff_h264_unref_picture(h, &h->DPB[i]); >> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) >> > avpriv_request_sample(avctx, "data partitioning"); >> > break; >> > case H264_NAL_SEI: >> > + av_buffer_unref(&h->hwaccel_params_buf); >> > if (h->setup_finished) { >> > avpriv_request_sample(avctx, "Late SEI"); >> > break; >> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) >> > break; >> > case H264_NAL_SPS: { >> > GetBitContext tmp_gb = nal->gb; >> > + av_buffer_unref(&h->hwaccel_params_buf); >> > if (avctx->hwaccel && avctx->hwaccel->decode_params) { >> > ret = avctx->hwaccel->decode_params(avctx, >> > nal->type, >> >> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why >> specifically those two? >> > > Found the explanation in the cover letter - I assume SEI is a mistake > and should really be PPS. The points where it is reset still seem highly > Vulkan-specific, so I think it'd be better for the decoder to set a flag > on seeing new parameter sets. Vulkan could would then check and clear > the flag as needed. > It was a typo, changed to remove it on SPS and PPS. We still have to store the compiled bufferref somewhere, and synchronize it between frames, and having it in the frame context itself solves those issues.
Apr 17, 2023, 11:46 by anton@khirnov.net: > Quoting Lynne (2023-03-14 07:33:43) > >> From e37330cf6a1f768c733ca6d0b23b9e5847a5fe4f Mon Sep 17 00:00:00 2001 >> From: Lynne <dev@lynne.ee> >> Date: Thu, 10 Mar 2022 18:03:05 +0100 >> Subject: [PATCH 25/92] avcodec: add AVHWAccel.free_frame_priv callback >> diff --git a/libavcodec/decode.c b/libavcodec/decode.c >> index d1ba7f167f..c4630ce275 100644 >> --- a/libavcodec/decode.c >> +++ b/libavcodec/decode.c >> @@ -1685,3 +1685,22 @@ int ff_copy_palette(void *dst, const AVPacket *src, void *logctx) >> } >> return 0; >> } >> + >> +AVBufferRef *ff_alloc_hwaccel_frame_priv_data(AVCodecContext *avctx, >> > > Sane-endian naming please. ff_hwaccel_frame_priv_alloc(). > Done >> + const AVHWAccel *hwaccel) >> +{ >> + AVBufferRef *ref; >> + uint8_t *data = av_mallocz(hwaccel->frame_priv_data_size); >> + if (!data) >> + return NULL; >> + >> + ref = av_buffer_create(data, hwaccel->frame_priv_data_size, >> + (void (*)(void *, uint8_t *))hwaccel->free_frame_priv, >> > > I believe the cast is UB and you should convert the arguments in the > function itself, or add a trivial wrapper here. > I converted the arguments of the function.
Quoting Lynne (2023-04-19 16:47:11) > Apr 17, 2023, 11:07 by anton@khirnov.net: > > > Quoting Anton Khirnov (2023-04-16 22:38:16) > > > >> Quoting Lynne (2023-03-14 07:33:43) > >> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001 > >> > From: Lynne <dev@lynne.ee> > >> > Date: Tue, 17 Jan 2023 05:01:45 +0100 > >> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf > >> > > >> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c > >> > index 0aee724c0d..6559593195 100644 > >> > --- a/libavcodec/h264dec.c > >> > +++ b/libavcodec/h264dec.c > >> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx) > >> > H264Context *h = avctx->priv_data; > >> > int i; > >> > > >> > + av_buffer_unref(&h->hwaccel_params_buf); > >> > ff_h264_remove_all_refs(h); > >> > ff_h264_free_tables(h); > >> > > >> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx) > >> > > >> > ff_h264_flush_change(h); > >> > ff_h264_sei_uninit(&h->sei); > >> > + av_buffer_unref(&h->hwaccel_params_buf); > >> > > >> > for (i = 0; i < H264_MAX_PICTURE_COUNT; i++) > >> > ff_h264_unref_picture(h, &h->DPB[i]); > >> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > >> > avpriv_request_sample(avctx, "data partitioning"); > >> > break; > >> > case H264_NAL_SEI: > >> > + av_buffer_unref(&h->hwaccel_params_buf); > >> > if (h->setup_finished) { > >> > avpriv_request_sample(avctx, "Late SEI"); > >> > break; > >> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) > >> > break; > >> > case H264_NAL_SPS: { > >> > GetBitContext tmp_gb = nal->gb; > >> > + av_buffer_unref(&h->hwaccel_params_buf); > >> > if (avctx->hwaccel && avctx->hwaccel->decode_params) { > >> > ret = avctx->hwaccel->decode_params(avctx, > >> > nal->type, > >> > >> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why > >> specifically those two? > >> > > > > Found the explanation in the cover letter - I assume SEI is a mistake > > and should really be PPS. The points where it is reset still seem highly > > Vulkan-specific, so I think it'd be better for the decoder to set a flag > > on seeing new parameter sets. Vulkan could would then check and clear > > the flag as needed. > > > > It was a typo, changed to remove it on SPS and PPS. > We still have to store the compiled bufferref somewhere, and synchronize > it between frames, and having it in the frame context itself solves > those issues. Except the way it is handled makes no sense for anything other than vulkan. Having per-thread hwaccel contexts with a callback for propagating the data would solve this more cleanly.
Quoting Lynne (2023-04-19 16:39:28) > Apr 17, 2023, 11:25 by anton@khirnov.net: > > > Quoting Lynne (2023-03-14 07:33:43) > > > >> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001 > >> From: Lynne <dev@lynne.ee> > >> Date: Thu, 15 Dec 2022 01:06:52 +0100 > >> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for > >> threadsafe hwaccels > >> > >> Vulkan is fully threadsafe and stateless, so we can benefit from this. > >> --- > >> libavcodec/hwconfig.h | 1 + > >> libavcodec/pthread_frame.c | 2 +- > >> 2 files changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h > >> index 721424912c..e6b78f0160 100644 > >> --- a/libavcodec/hwconfig.h > >> +++ b/libavcodec/hwconfig.h > >> @@ -24,6 +24,7 @@ > >> > >> > >> #define HWACCEL_CAP_ASYNC_SAFE (1 << 0) > >> +#define HWACCEL_CAP_THREAD_SAFE (1 << 1) > >> > >> > >> typedef struct AVCodecHWConfigInternal { > >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > >> index 74864e19c5..c096287233 100644 > >> --- a/libavcodec/pthread_frame.c > >> +++ b/libavcodec/pthread_frame.c > >> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg) > >> > >> /* if the previous thread uses hwaccel then we take the lock to ensure > >> * the threads don't run concurrently */ > >> - if (avctx->hwaccel) { > >> + if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) { > >> > > > > A major problem here is that hwaccel_priv_data exists in only a single > > instance shared across all the worker threads. If you want frame-threaded > > hwaccels, you'll need to add some mechanism for synchronizing it between > > threads. > > > > I really wonder how does your code survive all the races involved. > > > > It works because hwaccel_priv_data is read-only. No need to synchronize > anything, the contexts Vulkan creates are defined as driver-synchronized, > and all the info a frame needs is provided when decoding it, rather than > kept as a state. > So I don't see a problem here. From a quick glance, at least vk_av1_free_frame_priv() modifies this context, which seems highly unsafe. This really seems like a disaster waiting to happen, there should be some stronger guarantees that nobody will modify this context when they shouldn't.
Apr 22, 2023, 14:42 by anton@khirnov.net: > Quoting Lynne (2023-04-19 16:39:28) > >> Apr 17, 2023, 11:25 by anton@khirnov.net: >> >> > Quoting Lynne (2023-03-14 07:33:43) >> > >> >> From 6e2dfc44e50798264eb16bc2dcabfdbf2fbac2d7 Mon Sep 17 00:00:00 2001 >> >> From: Lynne <dev@lynne.ee> >> >> Date: Thu, 15 Dec 2022 01:06:52 +0100 >> >> Subject: [PATCH 24/92] hwconfig: add a new HWACCEL_CAP_THREAD_SAFE for >> >> threadsafe hwaccels >> >> >> >> Vulkan is fully threadsafe and stateless, so we can benefit from this. >> >> --- >> >> libavcodec/hwconfig.h | 1 + >> >> libavcodec/pthread_frame.c | 2 +- >> >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h >> >> index 721424912c..e6b78f0160 100644 >> >> --- a/libavcodec/hwconfig.h >> >> +++ b/libavcodec/hwconfig.h >> >> @@ -24,6 +24,7 @@ >> >> >> >> >> >> #define HWACCEL_CAP_ASYNC_SAFE (1 << 0) >> >> +#define HWACCEL_CAP_THREAD_SAFE (1 << 1) >> >> >> >> >> >> typedef struct AVCodecHWConfigInternal { >> >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c >> >> index 74864e19c5..c096287233 100644 >> >> --- a/libavcodec/pthread_frame.c >> >> +++ b/libavcodec/pthread_frame.c >> >> @@ -204,7 +204,7 @@ static attribute_align_arg void *frame_worker_thread(void *arg) >> >> >> >> /* if the previous thread uses hwaccel then we take the lock to ensure >> >> * the threads don't run concurrently */ >> >> - if (avctx->hwaccel) { >> >> + if (avctx->hwaccel && !(avctx->hwaccel->caps_internal & HWACCEL_CAP_THREAD_SAFE)) { >> >> >> > >> > A major problem here is that hwaccel_priv_data exists in only a single >> > instance shared across all the worker threads. If you want frame-threaded >> > hwaccels, you'll need to add some mechanism for synchronizing it between >> > threads. >> > >> > I really wonder how does your code survive all the races involved. >> > >> >> It works because hwaccel_priv_data is read-only. No need to synchronize >> anything, the contexts Vulkan creates are defined as driver-synchronized, >> and all the info a frame needs is provided when decoding it, rather than >> kept as a state. >> So I don't see a problem here. >> > > From a quick glance, at least vk_av1_free_frame_priv() modifies this > context, which seems highly unsafe. > Our AV1 decoder isn't threaded yet. I've removed the threadsafe flag from the hwaccel, Dave is currently looking into whether it's possible to remove the frame id. > This really seems like a disaster waiting to happen, there should be > some stronger guarantees that nobody will modify this context when they > shouldn't. > It's an opt-in flag, that's more than enough.
Apr 22, 2023, 14:38 by anton@khirnov.net: > Quoting Lynne (2023-04-19 16:47:11) > >> Apr 17, 2023, 11:07 by anton@khirnov.net: >> >> > Quoting Anton Khirnov (2023-04-16 22:38:16) >> > >> >> Quoting Lynne (2023-03-14 07:33:43) >> >> > From 61f412eea3fbcb1e2a8625796760c0e24fa3fb83 Mon Sep 17 00:00:00 2001 >> >> > From: Lynne <dev@lynne.ee> >> >> > Date: Tue, 17 Jan 2023 05:01:45 +0100 >> >> > Subject: [PATCH 27/92] h264dec: add hwaccel_params_buf >> >> > >> >> > diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c >> >> > index 0aee724c0d..6559593195 100644 >> >> > --- a/libavcodec/h264dec.c >> >> > +++ b/libavcodec/h264dec.c >> >> > @@ -341,6 +341,7 @@ static av_cold int h264_decode_end(AVCodecContext *avctx) >> >> > H264Context *h = avctx->priv_data; >> >> > int i; >> >> > >> >> > + av_buffer_unref(&h->hwaccel_params_buf); >> >> > ff_h264_remove_all_refs(h); >> >> > ff_h264_free_tables(h); >> >> > >> >> > @@ -470,6 +471,7 @@ static void h264_decode_flush(AVCodecContext *avctx) >> >> > >> >> > ff_h264_flush_change(h); >> >> > ff_h264_sei_uninit(&h->sei); >> >> > + av_buffer_unref(&h->hwaccel_params_buf); >> >> > >> >> > for (i = 0; i < H264_MAX_PICTURE_COUNT; i++) >> >> > ff_h264_unref_picture(h, &h->DPB[i]); >> >> > @@ -669,6 +671,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) >> >> > avpriv_request_sample(avctx, "data partitioning"); >> >> > break; >> >> > case H264_NAL_SEI: >> >> > + av_buffer_unref(&h->hwaccel_params_buf); >> >> > if (h->setup_finished) { >> >> > avpriv_request_sample(avctx, "Late SEI"); >> >> > break; >> >> > @@ -682,6 +685,7 @@ static int decode_nal_units(H264Context *h, const uint8_t *buf, int buf_size) >> >> > break; >> >> > case H264_NAL_SPS: { >> >> > GetBitContext tmp_gb = nal->gb; >> >> > + av_buffer_unref(&h->hwaccel_params_buf); >> >> > if (avctx->hwaccel && avctx->hwaccel->decode_params) { >> >> > ret = avctx->hwaccel->decode_params(avctx, >> >> > nal->type, >> >> >> >> Removed on SEI and SPS? That seems bizarre and highly arbitrary. Why >> >> specifically those two? >> >> >> > >> > Found the explanation in the cover letter - I assume SEI is a mistake >> > and should really be PPS. The points where it is reset still seem highly >> > Vulkan-specific, so I think it'd be better for the decoder to set a flag >> > on seeing new parameter sets. Vulkan could would then check and clear >> > the flag as needed. >> > >> >> It was a typo, changed to remove it on SPS and PPS. >> We still have to store the compiled bufferref somewhere, and synchronize >> it between frames, and having it in the frame context itself solves >> those issues. >> > > Except the way it is handled makes no sense for anything other than > vulkan. > > Having per-thread hwaccel contexts with a callback for propagating the > data would solve this more cleanly. > How would that look?
From 26136684812c3e94a35dea25cb5edfe32601a518 Mon Sep 17 00:00:00 2001 From: Lynne <dev@lynne.ee> Date: Sat, 25 Feb 2023 09:36:58 +0100 Subject: [PATCH 31/92] lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts --- libswscale/input.c | 8 ++++++++ libswscale/utils.c | 4 ++++ tests/ref/fate/sws-pixdesc-query | 26 ++++++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/libswscale/input.c b/libswscale/input.c index d5676062a2..41795c636e 100644 --- a/libswscale/input.c +++ b/libswscale/input.c @@ -1452,9 +1452,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->chrToYV12 = p010BEToUV_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->chrToYV12 = p012LEToUV_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->chrToYV12 = p012BEToUV_c; break; case AV_PIX_FMT_P016LE: @@ -1944,9 +1948,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->lumToYV12 = p010BEToY_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->lumToYV12 = p012LEToY_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->lumToYV12 = p012BEToY_c; break; case AV_PIX_FMT_GRAYF32LE: diff --git a/libswscale/utils.c b/libswscale/utils.c index 925c536bf1..a3a7a40750 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -248,8 +248,12 @@ static const FormatEntry format_entries[] = { [AV_PIX_FMT_X2BGR10LE] = { 1, 1 }, [AV_PIX_FMT_P210BE] = { 1, 1 }, [AV_PIX_FMT_P210LE] = { 1, 1 }, + [AV_PIX_FMT_P212BE] = { 1, 1 }, + [AV_PIX_FMT_P212LE] = { 1, 1 }, [AV_PIX_FMT_P410BE] = { 1, 1 }, [AV_PIX_FMT_P410LE] = { 1, 1 }, + [AV_PIX_FMT_P412BE] = { 1, 1 }, + [AV_PIX_FMT_P412LE] = { 1, 1 }, [AV_PIX_FMT_P216BE] = { 1, 1 }, [AV_PIX_FMT_P216LE] = { 1, 1 }, [AV_PIX_FMT_P416BE] = { 1, 1 }, diff --git a/tests/ref/fate/sws-pixdesc-query b/tests/ref/fate/sws-pixdesc-query index 14156a383c..fd7f2aefc0 100644 --- a/tests/ref/fate/sws-pixdesc-query +++ b/tests/ref/fate/sws-pixdesc-query @@ -67,8 +67,12 @@ isNBPS: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le x2bgr10be x2bgr10le x2rgb10be @@ -160,8 +164,10 @@ isBE: p012be p016be p210be + p212be p216be p410be + p412be p416be rgb444be rgb48be @@ -226,10 +232,14 @@ isYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le uyvy422 @@ -338,10 +348,14 @@ isPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -431,10 +445,14 @@ isSemiPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le @@ -853,10 +871,14 @@ Planar: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -1029,8 +1051,12 @@ DataInHighBits: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le xv36be xv36le xyz12be -- 2.39.2