Message ID | NQTold9--3-9@lynne.ee |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 3/14/2023 3:33 AM, Lynne wrote: > The attached patchset is all the common code changes that my Vulkan patchset needs. > > In total lines of code, this part has 425 additions and 131 deletions. > Most of that is additions to HEVC parsing. Excluding them, the patchset is > 200 lines of code added, which is manageable. > > Apart from the parser changes, the following other changes have been > made to the API: > > AVHWAccel.free_frame_priv exists due to Vulkan's way of using VkImageView > objects to wrap VkImage objects, which we need to free once they're no longer > in use. Every other API uses the direct objects in decoding, but with Vulkan, > they have to be represented by other objects. > We also use it to free the slice offsets buffer. > > AVHWAccel.flush exists due to Vulkan keeping decoder state, despite being > stateless in theory. The decoder has to be notified of flushes in order to reset > decoding slots and other data it needs, such as motion vectors and reference > lists for AV1. Otherwise, inferring whether a flush has happened can be codec > dependent, and hacky. > > hwaccel_params_buf exists due to Vulkan's way of compiling SPS/PPS data > into objects, making updating expensive. The change allows for hardware > to only upload new parameters if they have been changed. > It's insignificant for H264 and AV1, but HEVC's structures can reach 114 > megabytes of data that has to be uploaded, for a specially crafted input, > which is enough to DDOS an ingest. > The data is set and managed by the hwaccel, but does need to be synchronized > between different decoding threads, which this patch performs. > > Finally, the HWACCEL_CAP_THREAD_SAFE flag is added due to Vulkan being > actually threadsafe, and requiring no serialization. It does work and it > does actually make a difference, on average, it can increase performance > by 20% for an average B-frame using HEVC stream, depending on the > number of threads and the number of decode queues. > While hardware decoders are fast in general, certain vendors such as AMD > can choke up while playing 8k video, and this patch can significantly help > increase throughput. > > In context, the changes can be viewed here: > https://github.com/cyanreg/FFmpeg/tree/vulkan > > The rest of the whole patchset is either rewrites, filter code, or > the actual hardware accel code. > > The patchset will not be pushed standalone, but as part of the greater > Vulkan patchset. > > 31 patches attached. [...] > diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c > index a80e37e33f..5a3c51e94a 100644 > --- a/libavcodec/av1dec.c > +++ b/libavcodec/av1dec.c > @@ -591,6 +591,8 @@ static void av1_frame_unref(AVCodecContext *avctx, AV1Frame *f) > f->spatial_id = f->temporal_id = 0; > memset(f->skip_mode_frame_idx, 0, > 2 * sizeof(uint8_t)); > + memset(f->ref_order_hint, 0, > + 7 * sizeof(uint8_t)); > memset(&f->film_grain, 0, sizeof(f->film_grain)); > f->coded_lossless = 0; > } > @@ -633,6 +635,9 @@ static int av1_frame_ref(AVCodecContext *avctx, AV1Frame *dst, const AV1Frame *s > memcpy(dst->skip_mode_frame_idx, > src->skip_mode_frame_idx, > 2 * sizeof(uint8_t)); > + memcpy(dst->ref_order_hint, > + src->ref_order_hint, > + 7 * sizeof(uint8_t)); > memcpy(&dst->film_grain, > &src->film_grain, > sizeof(dst->film_grain)); > @@ -1267,6 +1272,10 @@ static int av1_decode_frame(AVCodecContext *avctx, AVFrame *frame, > s->cur_frame.spatial_id = header->spatial_id; > s->cur_frame.temporal_id = header->temporal_id; > > + for (int i = 0; i < 7; i++) > + s->cur_frame.ref_order_hint[i] = > + s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]]; Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks. And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory). > + > if (avctx->hwaccel && s->cur_frame.f->buf[0]) { > ret = avctx->hwaccel->start_frame(avctx, unit->data,
Mar 14, 2023, 12:51 by jamrial@gmail.com: > On 3/14/2023 3:33 AM, Lynne wrote: > >> + for (int i = 0; i < 7; i++) >> + s->cur_frame.ref_order_hint[i] = >> + s->raw_frame_header->ref_order_hint[s->raw_frame_header->ref_frame_idx[i]]; >> > > Why do you need this in cur_frame? It's not a derived value, and the AV1RawFrameHeader struct is accessible in all AVHWaccel callbacks. > > And i think you should be looking at s->ref[s->raw_frame_header->ref_frame_idx[i]].raw_frame_header->order_hint instead, too, which is the decoder state vs the raw values in the current frame header (Although they should match in theory). > >> + >> if (avctx->hwaccel && s->cur_frame.f->buf[0]) { >> ret = avctx->hwaccel->start_frame(avctx, unit->data, >> Fixed, removed the commit altogether and just used the value directly. Thanks for the review
From 26136684812c3e94a35dea25cb5edfe32601a518 Mon Sep 17 00:00:00 2001 From: Lynne <dev@lynne.ee> Date: Sat, 25 Feb 2023 09:36:58 +0100 Subject: [PATCH 31/92] lsws: add in/out support for the new 12-bit 2-plane 422 and 444 pixfmts --- libswscale/input.c | 8 ++++++++ libswscale/utils.c | 4 ++++ tests/ref/fate/sws-pixdesc-query | 26 ++++++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/libswscale/input.c b/libswscale/input.c index d5676062a2..41795c636e 100644 --- a/libswscale/input.c +++ b/libswscale/input.c @@ -1452,9 +1452,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->chrToYV12 = p010BEToUV_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->chrToYV12 = p012LEToUV_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->chrToYV12 = p012BEToUV_c; break; case AV_PIX_FMT_P016LE: @@ -1944,9 +1948,13 @@ av_cold void ff_sws_init_input_funcs(SwsContext *c) c->lumToYV12 = p010BEToY_c; break; case AV_PIX_FMT_P012LE: + case AV_PIX_FMT_P212LE: + case AV_PIX_FMT_P412LE: c->lumToYV12 = p012LEToY_c; break; case AV_PIX_FMT_P012BE: + case AV_PIX_FMT_P212BE: + case AV_PIX_FMT_P412BE: c->lumToYV12 = p012BEToY_c; break; case AV_PIX_FMT_GRAYF32LE: diff --git a/libswscale/utils.c b/libswscale/utils.c index 925c536bf1..a3a7a40750 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -248,8 +248,12 @@ static const FormatEntry format_entries[] = { [AV_PIX_FMT_X2BGR10LE] = { 1, 1 }, [AV_PIX_FMT_P210BE] = { 1, 1 }, [AV_PIX_FMT_P210LE] = { 1, 1 }, + [AV_PIX_FMT_P212BE] = { 1, 1 }, + [AV_PIX_FMT_P212LE] = { 1, 1 }, [AV_PIX_FMT_P410BE] = { 1, 1 }, [AV_PIX_FMT_P410LE] = { 1, 1 }, + [AV_PIX_FMT_P412BE] = { 1, 1 }, + [AV_PIX_FMT_P412LE] = { 1, 1 }, [AV_PIX_FMT_P216BE] = { 1, 1 }, [AV_PIX_FMT_P216LE] = { 1, 1 }, [AV_PIX_FMT_P416BE] = { 1, 1 }, diff --git a/tests/ref/fate/sws-pixdesc-query b/tests/ref/fate/sws-pixdesc-query index 14156a383c..fd7f2aefc0 100644 --- a/tests/ref/fate/sws-pixdesc-query +++ b/tests/ref/fate/sws-pixdesc-query @@ -67,8 +67,12 @@ isNBPS: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le x2bgr10be x2bgr10le x2rgb10be @@ -160,8 +164,10 @@ isBE: p012be p016be p210be + p212be p216be p410be + p412be p416be rgb444be rgb48be @@ -226,10 +232,14 @@ isYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le uyvy422 @@ -338,10 +348,14 @@ isPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -431,10 +445,14 @@ isSemiPlanarYUV: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le @@ -853,10 +871,14 @@ Planar: p016le p210be p210le + p212be + p212le p216be p216le p410be p410le + p412be + p412le p416be p416le yuv410p @@ -1029,8 +1051,12 @@ DataInHighBits: p012le p210be p210le + p212be + p212le p410be p410le + p412be + p412le xv36be xv36le xyz12be -- 2.39.2