diff mbox series

[FFmpeg-devel,v2] avcodec/av1_vaapi: add direct film grain mode

Message ID 20221120025914.39732-1-ruijing.dong@amd.com
State New
Headers show
Series [FFmpeg-devel,v2] avcodec/av1_vaapi: add direct film grain mode | expand

Checks

Context Check Description
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Dong, Ruijing Nov. 20, 2022, 2:59 a.m. UTC
Adding direct film grain mode for av1 decoder, which
outputs alongside film grain.

AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN is the new flag
introduced to enable this path.

issue:
By using AMD av1 decoder via VAAPI, when used with film
grain content, the output displays black screen with
incorrect frame order.

The issue being discussed in here:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6903#note_1613807

example:
This flag can be used in ffmpeg command:

ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
      -hwaccel_flags 8
      -i input_with_film_gram.obu
      output_with_film_grain.yuv

Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
---
update: add new option direct_film_grain in optons_table.h

 libavcodec/avcodec.h       | 17 +++++++++++++++++
 libavcodec/options_table.h |  1 +
 libavcodec/vaapi_av1.c     |  6 ++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

--
2.25.1

Comments

Mark Thompson Nov. 20, 2022, 4:44 p.m. UTC | #1
On 20/11/2022 02:59, Ruijing Dong wrote:
> Adding direct film grain mode for av1 decoder, which
> outputs alongside film grain.
> 
> AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN is the new flag
> introduced to enable this path.
> 
> issue:
> By using AMD av1 decoder via VAAPI, when used with film
> grain content, the output displays black screen with
> incorrect frame order.
> 
> The issue being discussed in here:
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/6903#note_1613807
> 
> example:
> This flag can be used in ffmpeg command:
> 
> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
>        -hwaccel_flags 8
>        -i input_with_film_gram.obu
>        output_with_film_grain.yuv
> 
> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
> ---
> update: add new option direct_film_grain in optons_table.h
> 
>   libavcodec/avcodec.h       | 17 +++++++++++++++++
>   libavcodec/options_table.h |  1 +
>   libavcodec/vaapi_av1.c     |  6 ++++--
>   3 files changed, 22 insertions(+), 2 deletions(-)

My understanding of this is as follows, please correct me if any of this is wrong:

* For AV1 with film grain enabled, VAAPI decode is specified with two output surfaces: one for pre-grain (reference) output and one for post-grain (display) output.
* The current driver in Mesa always writes to the pre-grain surface and ignores the post-grain surface entirely.
* To fix this, you intend to modify the VAAPI code in libavcodec to allow the user to manually override the expected VAAPI behaviour and instead assume that the post-grain output has been written to the pre-grain surface.

Is that right?

If it is, could you perhaps explain why this manual option is preferable to the more obvious solution of Mesa being fixed to write the post-grain output to the post-grain surface?

Thanks,

- Mark


> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index 3edd8e2636..9420e7820d 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -2253,6 +2253,23 @@ typedef struct AVHWAccel {
>    */
>   #define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)
> 
> +/**
> + * The film grain synthesis could be seperate from decoding process.
> + * The downstream device would apply the film grain parameters seperately.
> + * The desired film grain parameters can be found in SEI section in H264
> + * or H265 bitstreams.
> + *
> + * In AV1, film grain is mandatorily specified, av1 decoders like AMD
> + * av1 decoder process film grain content internally, and the output
> + * includes applied film grain. For the purpose of supporting these av1
> + * decoders, this flag needs to be utilized.
> + *
> + * @warning If the stream has no film grain content, this flag will
> + *          be ignored in the supported av1 decoders. It is advised
> + *          that this flag should only be used in av1 decoders
> + *          that support it.
> + */
> +#define AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN (1 << 3)
>   /**
>    * @}
>    */
> diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
> index cd02f5096f..0302f89280 100644
> --- a/libavcodec/options_table.h
> +++ b/libavcodec/options_table.h
> @@ -399,6 +399,7 @@ static const AVOption avcodec_options[] = {
>   {"ignore_level", "ignore level even if the codec level used is unknown or higher than the maximum supported level reported by the hardware driver", 0, AV_OPT_TYPE_CONST, { .i64 = AV_HWACCEL_FLAG_IGNORE_LEVEL }, INT_MIN, INT_MAX, V | D, "hwaccel_flags" },
>   {"allow_high_depth", "allow to output YUV pixel formats with a different chroma sampling than 4:2:0 and/or other than 8 bits per component", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
>   {"allow_profile_mismatch", "attempt to decode anyway if HW accelerated decoder's supported profiles do not exactly match the stream", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
> +{"direct_film_grain", "allow decoder to directly apply film grain to the output", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
>   {"extra_hw_frames", "Number of extra hardware frames to allocate for the user", OFFSET(extra_hw_frames), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, V|D },
>   {"discard_damaged_percentage", "Percentage of damaged samples to discard a frame", OFFSET(discard_damaged_percentage), AV_OPT_TYPE_INT, {.i64 = 95 }, 0, 100, V|D },
>   {NULL},
> diff --git a/libavcodec/vaapi_av1.c b/libavcodec/vaapi_av1.c
> index d0339b2705..6db910f2bf 100644
> --- a/libavcodec/vaapi_av1.c
> +++ b/libavcodec/vaapi_av1.c
> @@ -127,6 +127,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>       int8_t bit_depth_idx;
>       int err = 0;
>       int apply_grain = !(avctx->export_side_data & AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
> +    int direct_film_grain = avctx->hwaccel_flags & AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>       uint8_t remap_lr_type[4] = {AV1_RESTORE_NONE, AV1_RESTORE_SWITCHABLE, AV1_RESTORE_WIENER, AV1_RESTORE_SGRPROJ};
>       uint8_t segmentation_feature_signed[AV1_SEG_LVL_MAX] = {1, 1, 1, 1, 1, 0, 0, 0};
>       uint8_t segmentation_feature_max[AV1_SEG_LVL_MAX] = {255, AV1_MAX_LOOP_FILTER,
> @@ -136,7 +137,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>       if (bit_depth_idx < 0)
>           goto fail;
> 
> -    if (apply_grain) {
> +    if (apply_grain && !direct_film_grain) {
>           if (ctx->tmp_frame->buf[0])
>               ff_thread_release_buffer(avctx, ctx->tmp_frame);
>           err = ff_thread_get_buffer(avctx, ctx->tmp_frame, AV_GET_BUFFER_FLAG_REF);
> @@ -375,6 +376,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>       VAAPIAV1DecContext *ctx = avctx->internal->hwaccel_priv_data;
> 
>       int apply_grain = !(avctx->export_side_data & AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
> +    int direct_film_grain = avctx->hwaccel_flags & AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>       int ret;
>       ret = ff_vaapi_decode_issue(avctx, pic);
>       if (ret < 0)
> @@ -385,7 +387,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>               if (ctx->ref_tab[i].frame->buf[0])
>                   ff_thread_release_buffer(avctx, ctx->ref_tab[i].frame);
> 
> -            if (apply_grain) {
> +            if (apply_grain && !direct_film_grain) {
>                   ret = av_frame_ref(ctx->ref_tab[i].frame, ctx->tmp_frame);
>                   if (ret < 0)
>                       return ret;
> --
> 2.25.1
Dong, Ruijing Nov. 22, 2022, 7:18 p.m. UTC | #2
[AMD Official Use Only - General]

Hi Mark,

Sorry for being late to reply to you.

Your understanding is correct, and I have sent a new patch [v4] for addressing the current issue and to use
driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.

For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.

Thanks for your comments and insights!

Ruijing

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark Thompson
Sent: Sunday, November 20, 2022 11:44 AM
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode

On 20/11/2022 02:59, Ruijing Dong wrote:
> Adding direct film grain mode for av1 decoder, which outputs alongside
> film grain.
>
> AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN is the new flag introduced to enable
> this path.
>
> issue:
> By using AMD av1 decoder via VAAPI, when used with film grain content,
> the output displays black screen with incorrect frame order.
>
> The issue being discussed in here:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F6903%23note_1613807&am
> p;data=05%7C01%7Cruijing.dong%40amd.com%7C6f75be6a4f8044fe037d08dacb16
> 8529%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638045595864762041%7
> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Ms7PCNEjOs09JVQd2KB46St5
> w3V8Idbc2shZC80VefI%3D&amp;reserved=0
>
> example:
> This flag can be used in ffmpeg command:
>
> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
>        -hwaccel_flags 8
>        -i input_with_film_gram.obu
>        output_with_film_grain.yuv
>
> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
> ---
> update: add new option direct_film_grain in optons_table.h
>
>   libavcodec/avcodec.h       | 17 +++++++++++++++++
>   libavcodec/options_table.h |  1 +
>   libavcodec/vaapi_av1.c     |  6 ++++--
>   3 files changed, 22 insertions(+), 2 deletions(-)

My understanding of this is as follows, please correct me if any of this is wrong:

* For AV1 with film grain enabled, VAAPI decode is specified with two output surfaces: one for pre-grain (reference) output and one for post-grain (display) output.
* The current driver in Mesa always writes to the pre-grain surface and ignores the post-grain surface entirely.
* To fix this, you intend to modify the VAAPI code in libavcodec to allow the user to manually override the expected VAAPI behaviour and instead assume that the post-grain output has been written to the pre-grain surface.

Is that right?


[rdong]:

If it is, could you perhaps explain why this manual option is preferable to the more obvious solution of Mesa being fixed to write the post-grain output to the post-grain surface?

Thanks,

- Mark


> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index
> 3edd8e2636..9420e7820d 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -2253,6 +2253,23 @@ typedef struct AVHWAccel {
>    */
>   #define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)
>
> +/**
> + * The film grain synthesis could be seperate from decoding process.
> + * The downstream device would apply the film grain parameters seperately.
> + * The desired film grain parameters can be found in SEI section in
> +H264
> + * or H265 bitstreams.
> + *
> + * In AV1, film grain is mandatorily specified, av1 decoders like AMD
> + * av1 decoder process film grain content internally, and the output
> + * includes applied film grain. For the purpose of supporting these
> +av1
> + * decoders, this flag needs to be utilized.
> + *
> + * @warning If the stream has no film grain content, this flag will
> + *          be ignored in the supported av1 decoders. It is advised
> + *          that this flag should only be used in av1 decoders
> + *          that support it.
> + */
> +#define AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN (1 << 3)
>   /**
>    * @}
>    */
> diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
> index cd02f5096f..0302f89280 100644
> --- a/libavcodec/options_table.h
> +++ b/libavcodec/options_table.h
> @@ -399,6 +399,7 @@ static const AVOption avcodec_options[] = {
>   {"ignore_level", "ignore level even if the codec level used is unknown or higher than the maximum supported level reported by the hardware driver", 0, AV_OPT_TYPE_CONST, { .i64 = AV_HWACCEL_FLAG_IGNORE_LEVEL }, INT_MIN, INT_MAX, V | D, "hwaccel_flags" },
>   {"allow_high_depth", "allow to output YUV pixel formats with a different chroma sampling than 4:2:0 and/or other than 8 bits per component", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
>   {"allow_profile_mismatch", "attempt to decode anyway if HW
> accelerated decoder's supported profiles do not exactly match the
> stream", 0, AV_OPT_TYPE_CONST, {.i64 =
> AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH }, INT_MIN, INT_MAX, V | D,
> "hwaccel_flags"},
> +{"direct_film_grain", "allow decoder to directly apply film grain to
> +the output", 0, AV_OPT_TYPE_CONST, {.i64 =
> +AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN }, INT_MIN, INT_MAX, V | D,
> +"hwaccel_flags"},
>   {"extra_hw_frames", "Number of extra hardware frames to allocate for the user", OFFSET(extra_hw_frames), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, V|D },
>   {"discard_damaged_percentage", "Percentage of damaged samples to discard a frame", OFFSET(discard_damaged_percentage), AV_OPT_TYPE_INT, {.i64 = 95 }, 0, 100, V|D },
>   {NULL},
> diff --git a/libavcodec/vaapi_av1.c b/libavcodec/vaapi_av1.c index
> d0339b2705..6db910f2bf 100644
> --- a/libavcodec/vaapi_av1.c
> +++ b/libavcodec/vaapi_av1.c
> @@ -127,6 +127,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>       int8_t bit_depth_idx;
>       int err = 0;
>       int apply_grain = !(avctx->export_side_data &
> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
> +    int direct_film_grain = avctx->hwaccel_flags &
> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>       uint8_t remap_lr_type[4] = {AV1_RESTORE_NONE, AV1_RESTORE_SWITCHABLE, AV1_RESTORE_WIENER, AV1_RESTORE_SGRPROJ};
>       uint8_t segmentation_feature_signed[AV1_SEG_LVL_MAX] = {1, 1, 1, 1, 1, 0, 0, 0};
>       uint8_t segmentation_feature_max[AV1_SEG_LVL_MAX] = {255,
> AV1_MAX_LOOP_FILTER, @@ -136,7 +137,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>       if (bit_depth_idx < 0)
>           goto fail;
>
> -    if (apply_grain) {
> +    if (apply_grain && !direct_film_grain) {
>           if (ctx->tmp_frame->buf[0])
>               ff_thread_release_buffer(avctx, ctx->tmp_frame);
>           err = ff_thread_get_buffer(avctx, ctx->tmp_frame,
> AV_GET_BUFFER_FLAG_REF); @@ -375,6 +376,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>       VAAPIAV1DecContext *ctx = avctx->internal->hwaccel_priv_data;
>
>       int apply_grain = !(avctx->export_side_data &
> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
> +    int direct_film_grain = avctx->hwaccel_flags &
> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>       int ret;
>       ret = ff_vaapi_decode_issue(avctx, pic);
>       if (ret < 0)
> @@ -385,7 +387,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>               if (ctx->ref_tab[i].frame->buf[0])
>                   ff_thread_release_buffer(avctx,
> ctx->ref_tab[i].frame);
>
> -            if (apply_grain) {
> +            if (apply_grain && !direct_film_grain) {
>                   ret = av_frame_ref(ctx->ref_tab[i].frame, ctx->tmp_frame);
>                   if (ret < 0)
>                       return ret;
> --
> 2.25.1
Mark Thompson Nov. 22, 2022, 8:26 p.m. UTC | #3
On 22/11/2022 19:18, Dong, Ruijing wrote:
> [AMD Official Use Only - General]
> 
> Hi Mark,
> 
> Sorry for being late to reply to you.
> 
> Your understanding is correct, and I have sent a new patch [v4] for addressing the current issue and to use
> driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.
> 
> For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.

I don't see why this requires you to write the output to the wrong surface.  Why not write it to the correct one instead?

- Mark

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark Thompson
> Sent: Sunday, November 20, 2022 11:44 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode
> 
> On 20/11/2022 02:59, Ruijing Dong wrote:
>> Adding direct film grain mode for av1 decoder, which outputs alongside
>> film grain.
>>
>> AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN is the new flag introduced to enable
>> this path.
>>
>> issue:
>> By using AMD av1 decoder via VAAPI, when used with film grain content,
>> the output displays black screen with incorrect frame order.
>>
>> The issue being discussed in here:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
>> ab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F6903%23note_1613807&am
>> p;data=05%7C01%7Cruijing.dong%40amd.com%7C6f75be6a4f8044fe037d08dacb16
>> 8529%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638045595864762041%7
>> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
>> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Ms7PCNEjOs09JVQd2KB46St5
>> w3V8Idbc2shZC80VefI%3D&amp;reserved=0
>>
>> example:
>> This flag can be used in ffmpeg command:
>>
>> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
>>         -hwaccel_flags 8
>>         -i input_with_film_gram.obu
>>         output_with_film_grain.yuv
>>
>> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
>> ---
>> update: add new option direct_film_grain in optons_table.h
>>
>>    libavcodec/avcodec.h       | 17 +++++++++++++++++
>>    libavcodec/options_table.h |  1 +
>>    libavcodec/vaapi_av1.c     |  6 ++++--
>>    3 files changed, 22 insertions(+), 2 deletions(-)
> 
> My understanding of this is as follows, please correct me if any of this is wrong:
> 
> * For AV1 with film grain enabled, VAAPI decode is specified with two output surfaces: one for pre-grain (reference) output and one for post-grain (display) output.
> * The current driver in Mesa always writes to the pre-grain surface and ignores the post-grain surface entirely.
> * To fix this, you intend to modify the VAAPI code in libavcodec to allow the user to manually override the expected VAAPI behaviour and instead assume that the post-grain output has been written to the pre-grain surface.
> 
> Is that right?
> 
> 
> [rdong]:
> 
> If it is, could you perhaps explain why this manual option is preferable to the more obvious solution of Mesa being fixed to write the post-grain output to the post-grain surface?
> 
> Thanks,
> 
> - Mark
> 
> 
>> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index
>> 3edd8e2636..9420e7820d 100644
>> --- a/libavcodec/avcodec.h
>> +++ b/libavcodec/avcodec.h
>> @@ -2253,6 +2253,23 @@ typedef struct AVHWAccel {
>>     */
>>    #define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)
>>
>> +/**
>> + * The film grain synthesis could be seperate from decoding process.
>> + * The downstream device would apply the film grain parameters seperately.
>> + * The desired film grain parameters can be found in SEI section in
>> +H264
>> + * or H265 bitstreams.
>> + *
>> + * In AV1, film grain is mandatorily specified, av1 decoders like AMD
>> + * av1 decoder process film grain content internally, and the output
>> + * includes applied film grain. For the purpose of supporting these
>> +av1
>> + * decoders, this flag needs to be utilized.
>> + *
>> + * @warning If the stream has no film grain content, this flag will
>> + *          be ignored in the supported av1 decoders. It is advised
>> + *          that this flag should only be used in av1 decoders
>> + *          that support it.
>> + */
>> +#define AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN (1 << 3)
>>    /**
>>     * @}
>>     */
>> diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
>> index cd02f5096f..0302f89280 100644
>> --- a/libavcodec/options_table.h
>> +++ b/libavcodec/options_table.h
>> @@ -399,6 +399,7 @@ static const AVOption avcodec_options[] = {
>>    {"ignore_level", "ignore level even if the codec level used is unknown or higher than the maximum supported level reported by the hardware driver", 0, AV_OPT_TYPE_CONST, { .i64 = AV_HWACCEL_FLAG_IGNORE_LEVEL }, INT_MIN, INT_MAX, V | D, "hwaccel_flags" },
>>    {"allow_high_depth", "allow to output YUV pixel formats with a different chroma sampling than 4:2:0 and/or other than 8 bits per component", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
>>    {"allow_profile_mismatch", "attempt to decode anyway if HW
>> accelerated decoder's supported profiles do not exactly match the
>> stream", 0, AV_OPT_TYPE_CONST, {.i64 =
>> AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH }, INT_MIN, INT_MAX, V | D,
>> "hwaccel_flags"},
>> +{"direct_film_grain", "allow decoder to directly apply film grain to
>> +the output", 0, AV_OPT_TYPE_CONST, {.i64 =
>> +AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN }, INT_MIN, INT_MAX, V | D,
>> +"hwaccel_flags"},
>>    {"extra_hw_frames", "Number of extra hardware frames to allocate for the user", OFFSET(extra_hw_frames), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, V|D },
>>    {"discard_damaged_percentage", "Percentage of damaged samples to discard a frame", OFFSET(discard_damaged_percentage), AV_OPT_TYPE_INT, {.i64 = 95 }, 0, 100, V|D },
>>    {NULL},
>> diff --git a/libavcodec/vaapi_av1.c b/libavcodec/vaapi_av1.c index
>> d0339b2705..6db910f2bf 100644
>> --- a/libavcodec/vaapi_av1.c
>> +++ b/libavcodec/vaapi_av1.c
>> @@ -127,6 +127,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>>        int8_t bit_depth_idx;
>>        int err = 0;
>>        int apply_grain = !(avctx->export_side_data &
>> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
>> +    int direct_film_grain = avctx->hwaccel_flags &
>> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>>        uint8_t remap_lr_type[4] = {AV1_RESTORE_NONE, AV1_RESTORE_SWITCHABLE, AV1_RESTORE_WIENER, AV1_RESTORE_SGRPROJ};
>>        uint8_t segmentation_feature_signed[AV1_SEG_LVL_MAX] = {1, 1, 1, 1, 1, 0, 0, 0};
>>        uint8_t segmentation_feature_max[AV1_SEG_LVL_MAX] = {255,
>> AV1_MAX_LOOP_FILTER, @@ -136,7 +137,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>>        if (bit_depth_idx < 0)
>>            goto fail;
>>
>> -    if (apply_grain) {
>> +    if (apply_grain && !direct_film_grain) {
>>            if (ctx->tmp_frame->buf[0])
>>                ff_thread_release_buffer(avctx, ctx->tmp_frame);
>>            err = ff_thread_get_buffer(avctx, ctx->tmp_frame,
>> AV_GET_BUFFER_FLAG_REF); @@ -375,6 +376,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>>        VAAPIAV1DecContext *ctx = avctx->internal->hwaccel_priv_data;
>>
>>        int apply_grain = !(avctx->export_side_data &
>> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
>> +    int direct_film_grain = avctx->hwaccel_flags &
>> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>>        int ret;
>>        ret = ff_vaapi_decode_issue(avctx, pic);
>>        if (ret < 0)
>> @@ -385,7 +387,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>>                if (ctx->ref_tab[i].frame->buf[0])
>>                    ff_thread_release_buffer(avctx,
>> ctx->ref_tab[i].frame);
>>
>> -            if (apply_grain) {
>> +            if (apply_grain && !direct_film_grain) {
>>                    ret = av_frame_ref(ctx->ref_tab[i].frame, ctx->tmp_frame);
>>                    if (ret < 0)
>>                        return ret;
>> --
>> 2.25.1
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fffmpeg.org%2Fmailman%2Flistinfo%2Fffmpeg-devel&amp;data=05%7C01%7Cruijing.dong%40amd.com%7C6f75be6a4f8044fe037d08dacb168529%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638045595864762041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p3ypzkjWCGCqBPCsHLm4rGHI1%2BwwxY0pyK1l8IQkaWs%3D&amp;reserved=0
> 
> To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Mark Thompson Nov. 22, 2022, 8:59 p.m. UTC | #4
On 22/11/2022 20:26, Mark Thompson wrote:
> On 22/11/2022 19:18, Dong, Ruijing wrote:
>> [AMD Official Use Only - General]
>>
>> Hi Mark,
>>
>> Sorry for being late to reply to you.
>>
>> Your understanding is correct, and I have sent a new patch [v4] for addressing the current issue and to use
>> driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.
>>
>> For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.
> 
> I don't see why this requires you to write the output to the wrong surface.  Why not write it to the correct one instead?

Indeed, this seems to be a trivial fix in Mesa: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19938>.

It would be helpful if someone with suitable hardware could test that.

Thanks,

- Mark

>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark Thompson
>> Sent: Sunday, November 20, 2022 11:44 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode
>>
>> On 20/11/2022 02:59, Ruijing Dong wrote:
>>> Adding direct film grain mode for av1 decoder, which outputs alongside
>>> film grain.
>>>
>>> AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN is the new flag introduced to enable
>>> this path.
>>>
>>> issue:
>>> By using AMD av1 decoder via VAAPI, when used with film grain content,
>>> the output displays black screen with incorrect frame order.
>>>
>>> The issue being discussed in here:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
>>> ab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fissues%2F6903%23note_1613807&am
>>> p;data=05%7C01%7Cruijing.dong%40amd.com%7C6f75be6a4f8044fe037d08dacb16
>>> 8529%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638045595864762041%7
>>> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
>>> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Ms7PCNEjOs09JVQd2KB46St5
>>> w3V8Idbc2shZC80VefI%3D&amp;reserved=0
>>>
>>> example:
>>> This flag can be used in ffmpeg command:
>>>
>>> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
>>>         -hwaccel_flags 8
>>>         -i input_with_film_gram.obu
>>>         output_with_film_grain.yuv
>>>
>>> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
>>> ---
>>> update: add new option direct_film_grain in optons_table.h
>>>
>>>    libavcodec/avcodec.h       | 17 +++++++++++++++++
>>>    libavcodec/options_table.h |  1 +
>>>    libavcodec/vaapi_av1.c     |  6 ++++--
>>>    3 files changed, 22 insertions(+), 2 deletions(-)
>>
>> My understanding of this is as follows, please correct me if any of this is wrong:
>>
>> * For AV1 with film grain enabled, VAAPI decode is specified with two output surfaces: one for pre-grain (reference) output and one for post-grain (display) output.
>> * The current driver in Mesa always writes to the pre-grain surface and ignores the post-grain surface entirely.
>> * To fix this, you intend to modify the VAAPI code in libavcodec to allow the user to manually override the expected VAAPI behaviour and instead assume that the post-grain output has been written to the pre-grain surface.
>>
>> Is that right?
>>
>>
>> [rdong]:
>>
>> If it is, could you perhaps explain why this manual option is preferable to the more obvious solution of Mesa being fixed to write the post-grain output to the post-grain surface?
>>
>> Thanks,
>>
>> - Mark
>>
>>
>>> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index
>>> 3edd8e2636..9420e7820d 100644
>>> --- a/libavcodec/avcodec.h
>>> +++ b/libavcodec/avcodec.h
>>> @@ -2253,6 +2253,23 @@ typedef struct AVHWAccel {
>>>     */
>>>    #define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)
>>>
>>> +/**
>>> + * The film grain synthesis could be seperate from decoding process.
>>> + * The downstream device would apply the film grain parameters seperately.
>>> + * The desired film grain parameters can be found in SEI section in
>>> +H264
>>> + * or H265 bitstreams.
>>> + *
>>> + * In AV1, film grain is mandatorily specified, av1 decoders like AMD
>>> + * av1 decoder process film grain content internally, and the output
>>> + * includes applied film grain. For the purpose of supporting these
>>> +av1
>>> + * decoders, this flag needs to be utilized.
>>> + *
>>> + * @warning If the stream has no film grain content, this flag will
>>> + *          be ignored in the supported av1 decoders. It is advised
>>> + *          that this flag should only be used in av1 decoders
>>> + *          that support it.
>>> + */
>>> +#define AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN (1 << 3)
>>>    /**
>>>     * @}
>>>     */
>>> diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
>>> index cd02f5096f..0302f89280 100644
>>> --- a/libavcodec/options_table.h
>>> +++ b/libavcodec/options_table.h
>>> @@ -399,6 +399,7 @@ static const AVOption avcodec_options[] = {
>>>    {"ignore_level", "ignore level even if the codec level used is unknown or higher than the maximum supported level reported by the hardware driver", 0, AV_OPT_TYPE_CONST, { .i64 = AV_HWACCEL_FLAG_IGNORE_LEVEL }, INT_MIN, INT_MAX, V | D, "hwaccel_flags" },
>>>    {"allow_high_depth", "allow to output YUV pixel formats with a different chroma sampling than 4:2:0 and/or other than 8 bits per component", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
>>>    {"allow_profile_mismatch", "attempt to decode anyway if HW
>>> accelerated decoder's supported profiles do not exactly match the
>>> stream", 0, AV_OPT_TYPE_CONST, {.i64 =
>>> AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH }, INT_MIN, INT_MAX, V | D,
>>> "hwaccel_flags"},
>>> +{"direct_film_grain", "allow decoder to directly apply film grain to
>>> +the output", 0, AV_OPT_TYPE_CONST, {.i64 =
>>> +AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN }, INT_MIN, INT_MAX, V | D,
>>> +"hwaccel_flags"},
>>>    {"extra_hw_frames", "Number of extra hardware frames to allocate for the user", OFFSET(extra_hw_frames), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, V|D },
>>>    {"discard_damaged_percentage", "Percentage of damaged samples to discard a frame", OFFSET(discard_damaged_percentage), AV_OPT_TYPE_INT, {.i64 = 95 }, 0, 100, V|D },
>>>    {NULL},
>>> diff --git a/libavcodec/vaapi_av1.c b/libavcodec/vaapi_av1.c index
>>> d0339b2705..6db910f2bf 100644
>>> --- a/libavcodec/vaapi_av1.c
>>> +++ b/libavcodec/vaapi_av1.c
>>> @@ -127,6 +127,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>>>        int8_t bit_depth_idx;
>>>        int err = 0;
>>>        int apply_grain = !(avctx->export_side_data &
>>> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
>>> +    int direct_film_grain = avctx->hwaccel_flags &
>>> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>>>        uint8_t remap_lr_type[4] = {AV1_RESTORE_NONE, AV1_RESTORE_SWITCHABLE, AV1_RESTORE_WIENER, AV1_RESTORE_SGRPROJ};
>>>        uint8_t segmentation_feature_signed[AV1_SEG_LVL_MAX] = {1, 1, 1, 1, 1, 0, 0, 0};
>>>        uint8_t segmentation_feature_max[AV1_SEG_LVL_MAX] = {255,
>>> AV1_MAX_LOOP_FILTER, @@ -136,7 +137,7 @@ static int vaapi_av1_start_frame(AVCodecContext *avctx,
>>>        if (bit_depth_idx < 0)
>>>            goto fail;
>>>
>>> -    if (apply_grain) {
>>> +    if (apply_grain && !direct_film_grain) {
>>>            if (ctx->tmp_frame->buf[0])
>>>                ff_thread_release_buffer(avctx, ctx->tmp_frame);
>>>            err = ff_thread_get_buffer(avctx, ctx->tmp_frame,
>>> AV_GET_BUFFER_FLAG_REF); @@ -375,6 +376,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>>>        VAAPIAV1DecContext *ctx = avctx->internal->hwaccel_priv_data;
>>>
>>>        int apply_grain = !(avctx->export_side_data &
>>> AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
>>> +    int direct_film_grain = avctx->hwaccel_flags &
>>> + AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
>>>        int ret;
>>>        ret = ff_vaapi_decode_issue(avctx, pic);
>>>        if (ret < 0)
>>> @@ -385,7 +387,7 @@ static int vaapi_av1_end_frame(AVCodecContext *avctx)
>>>                if (ctx->ref_tab[i].frame->buf[0])
>>>                    ff_thread_release_buffer(avctx,
>>> ctx->ref_tab[i].frame);
>>>
>>> -            if (apply_grain) {
>>> +            if (apply_grain && !direct_film_grain) {
>>>                    ret = av_frame_ref(ctx->ref_tab[i].frame, ctx->tmp_frame);
>>>                    if (ret < 0)
>>>                        return ret;
>>> -- 
>>> 2.25.1
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fffmpeg.org%2Fmailman%2Flistinfo%2Fffmpeg-devel&amp;data=05%7C01%7Cruijing.dong%40amd.com%7C6f75be6a4f8044fe037d08dacb168529%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638045595864762041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p3ypzkjWCGCqBPCsHLm4rGHI1%2BwwxY0pyK1l8IQkaWs%3D&amp;reserved=0
>>
>> To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>>
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Mark Thompson Nov. 22, 2022, 11:34 p.m. UTC | #5
On 22/11/2022 20:59, Mark Thompson wrote:
> On 22/11/2022 20:26, Mark Thompson wrote:
>> On 22/11/2022 19:18, Dong, Ruijing wrote:
>>> [AMD Official Use Only - General]
>>>
>>> Hi Mark,
>>>
>>> Sorry for being late to reply to you.
>>>
>>> Your understanding is correct, and I have sent a new patch [v4] for addressing the current issue and to use
>>> driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.
>>>
>>> For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.
>>
>> I don't see why this requires you to write the output to the wrong surface.  Why not write it to the correct one instead?
> 
> Indeed, this seems to be a trivial fix in Mesa: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19938>.
> 
> It would be helpful if someone with suitable hardware could test that.

This was too naive, the Mesa driver doesn't make this easy.

It is only set up to write to a single surface, which is the one provided to vaBeginPicture().  However, VAAPI does not work that way - it wants you to write to both the pre-grain and the post-grain surfaces, where the pre-grain surface is the primary target and gets passed the vaBeginPicture() and the post-grain surface is supplied in the parameters.

So that's the first problem: the render target which is given as the pre-grain surface needs to be replaced by post-grain surface if we want to only write a single surface.

Is that enough?  Well, no.  The Mesa driver is also messing with the reference frames.

The VAAPI model is that the pre-grain surfaces are passed back into the driver on subsequent frames when they are used as reference frames.  However, the Mesa driver has hidden the pre-grain surface internally and only written the post-grain surface.

Therefore, when writing a post-grain output, it magically associates with the target surface information about the pre-grain surface which was written internally at the same time.  Then, when you later give it that surface as a reference frame it ignores the actual content of the frame and looks at the associated data to find what to use internally as the reference.

That's the second problem: if the post-grain surface were actually the render target then the magic internal reference gets associated with that, and when we pass the real reference frame (the pre-grain surface) in later then it won't recognise it because it never wrote to that surface.

How should it be fixed, then?

The best way would be to stop hiding the internal information about reference frames: if the real reference frames were visible in VAAPI then everything would just work and none of the magic internal references would be needed.

If we suppose that this can't be done (maybe it is hidden behind opaque firmware which the naughty users buying the products are not allowed to see), then Mesa needs two changes:

1.  Write the output to the post-grain surface rather than the pre-grain surface.  This is nontrivial because it isn't the surface passed to vaBeginPicture(), but given the API there isn't really any way around it.

2.  Attach the magic internal reference to the pre-grain surface, /even though it wasn't the one written to/.  This makes the reference frames work, since the pre-grain surfaces will be the ones passed back in later frames.

Alternatively: make new API in libva somehow.  Probably wants an attribute which indicates that vaBeginPicture() would want the post-grain surface and then ignore the surfaces in the picture parameters?  Unclear exactly how this should be specified, but whatever it is it needs to be very clear about how the references would work.

Hacking FFmpeg to use the API differently based on matching substrings in the vendor name does not seem like a good approach here, given that future Mesa decode implementations (which could be AMD or could be other hardware) may well be more sensible.  It would also doom other VAAPI users, since you can only really send this sort of hack to a few big projects like FFmpeg.

Thanks,

- Mark
Dong, Ruijing Nov. 23, 2022, 2:43 a.m. UTC | #6
[AMD Official Use Only - General]

Hi Mark,

Just got the ffmpeg email, please see my answer below in [rdong].

Thanks,
Ruijing

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark Thompson
Sent: Tuesday, November 22, 2022 6:34 PM
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode

On 22/11/2022 20:59, Mark Thompson wrote:
> On 22/11/2022 20:26, Mark Thompson wrote:
>> On 22/11/2022 19:18, Dong, Ruijing wrote:
>>> [AMD Official Use Only - General]
>>>
>>> Hi Mark,
>>>
>>> Sorry for being late to reply to you.
>>>
>>> Your understanding is correct, and I have sent a new patch [v4] for
>>> addressing the current issue and to use driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.
>>>
>>> For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.
>>
>> I don't see why this requires you to write the output to the wrong surface.  Why not write it to the correct one instead?
>
> Indeed, this seems to be a trivial fix in Mesa: <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F19938&amp;data=05%7C01%7Cruijing.dong%40amd.com%7C0896eeffcf674092aa8408dacce228f3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638047570363839738%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3Be3k9TQdRjGJv2KYNsssFjl3ORaNXzdSBqei%2BgYSBg%3D&amp;reserved=0>.
>
> It would be helpful if someone with suitable hardware could test that.

This was too naive, the Mesa driver doesn't make this easy.

It is only set up to write to a single surface, which is the one provided to vaBeginPicture().  However, VAAPI does not work that way - it wants you to write to both the pre-grain and the post-grain surfaces, where the pre-grain surface is the primary target and gets passed the vaBeginPicture() and the post-grain surface is supplied in the parameters.

So that's the first problem: the render target which is given as the pre-grain surface needs to be replaced by post-grain surface if we want to only write a single surface.

[rdong] Well, the render surface targets to output the displayable surface, and film grain has many ways to be carried on,  one way is to let application doing next step, the other way is the
decoder who can directly output the final applied grain result.  The VAAPI interface should accommodate as much as possible hardware instead of limiting the only way
for the implementation. AMD decoder for the performance optimization (tiled formats and swizzle modes) and other considerations, it will need to manage the reference frames internally, and there is no point to implement a new logic to output both pre-grain and post-grain surfaces. If grain_apply is set, then the decoder will only output the applied grain result.

Is that enough?  Well, no.  The Mesa driver is also messing with the reference frames.

The VAAPI model is that the pre-grain surfaces are passed back into the driver on subsequent frames when they are used as reference frames.  However, the Mesa driver has hidden the pre-grain surface internally and only written the post-grain surface.

Therefore, when writing a post-grain output, it magically associates with the target surface information about the pre-grain surface which was written internally at the same time.  Then, when you later give it that surface as a reference frame it ignores the actual content of the frame and looks at the associated data to find what to use internally as the reference.

That's the second problem: if the post-grain surface were actually the render target then the magic internal reference gets associated with that, and when we pass the real reference frame (the pre-grain surface) in later then it won't recognise it because it never wrote to that surface.

[rdong] In my understanding the target surface should be the final output, and the reason AMD decoder cannot output the reference frames I have already described above, in fact, the reference frame buffer pointers are used as reference to indicate how the reference picture buffer will managed internally, please understand there are some concept differences there.

How should it be fixed, then?

The best way would be to stop hiding the internal information about reference frames: if the real reference frames were visible in VAAPI then everything would just work and none of the magic internal references would be needed.

[rdong] Well, it has many difficulties to map the well designed AMD decoder to the VAAPI interface, however we could not expect everything is perfect.

If we suppose that this can't be done (maybe it is hidden behind opaque firmware which the naughty users buying the products are not allowed to see), then Mesa needs two changes:

1.  Write the output to the post-grain surface rather than the pre-grain surface.  This is nontrivial because it isn't the surface passed to vaBeginPicture(), but given the API there isn't really any way around it.

2.  Attach the magic internal reference to the pre-grain surface, /even though it wasn't the one written to/.  This makes the reference frames work, since the pre-grain surfaces will be the ones passed back in later frames.

[rdong] What is the point of passing back and forth the so called pre-grain surfaces as reference frames?

Alternatively: make new API in libva somehow.  Probably wants an attribute which indicates that vaBeginPicture() would want the post-grain surface and then ignore the surfaces in the picture parameters?  Unclear exactly how this should be specified, but whatever it is it needs to be very clear about how the references would work.

Hacking FFmpeg to use the API differently based on matching substrings in the vendor name does not seem like a good approach here, given that future Mesa decode implementations (which could be AMD or could be other hardware) may well be more sensible.  It would also doom other VAAPI users, since you can only really send this sort of hack to a few big projects like FFmpeg.

[rdong] I agree with you on this idea, however as I checked the current substring is in fact only used by AMD driver, if there were some better ideas, we would like to consider them as well.


Thanks,

- Mark
Dong, Ruijing Nov. 24, 2022, 3:27 p.m. UTC | #7
[AMD Official Use Only - General]

I might have misunderstood some of the questions, and I would like to explain more about the issue from my perspective, please correct me if anything wrong.

This patch is NOT a hack, like @Mark Thompson <sw@jkqxz.net> mentioned.

Video codec, especially decoders will need to meet the requirements of video codecs first, if the reference picture management (DPB) was implemented wrongly,
then it could not meet the fundamental decoder criteria. From this point of view, different hardware will need to follow the same standard for the implementation
so that the decoders can generate the conformance outputs.

The DPB is always an internal part of the decoder, the detail implementation could be differed with different benefits, if DPB is managed by the application, it can be
more flexible and easily maintained, the other way is the DPB is managed by the driver and hardware itself, it could have more space for the optimization, for
example the reference frame access, where the format of reference frames is NOT used for display output, and the display output cannot be used for the
reference frames neither because reference frames could use a different format, which is more efficient for reference access however not good for display.
From my point of view VAAPI supports both of the above two ideas, and it is not necessary to force one to follow another, because that is limited by the initial design
Idea.  In this case, there is no pre-grain output in the former decoder, it has only one display output.

From the other side, most of the AMD AV1 decoding issues are resolved from the community, the film grain problem becomes more noticeable. And generally speaking
it is usually a flexible part of the post processing phase after video decoding, and here it is strictly defined in AV1 spec, and it is part of the decoding standard.
It is not practical to make changes in the DPB design idea for resolving this issue from AMD decoder side. And naturally output the applied firm grain is just another
film grain process mode, I called it "direct film grain mode".

I have asked the community to inspire me to have a better idea, and eventually I found out there is no good way other than to have the external user choice or detecting
AMD driver.  I understand doing the string match to choose AMD driver is not a perfect idea, but we really need to have a method to resolve this issue. Please let me know
If a better way come to your mind, which can resolve this issue and in the meanwhile not affecting other AV1 implementation path.

Appreciate for the help and comments!

Ruijing

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Dong, Ruijing
Sent: Tuesday, November 22, 2022 9:43 PM
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode

[AMD Official Use Only - General]

[AMD Official Use Only - General]

Hi Mark,

Just got the ffmpeg email, please see my answer below in [rdong].

Thanks,
Ruijing

-----Original Message-----
From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Mark Thompson
Sent: Tuesday, November 22, 2022 6:34 PM
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode

On 22/11/2022 20:59, Mark Thompson wrote:
> On 22/11/2022 20:26, Mark Thompson wrote:
>> On 22/11/2022 19:18, Dong, Ruijing wrote:
>>> [AMD Official Use Only - General]
>>>
>>> Hi Mark,
>>>
>>> Sorry for being late to reply to you.
>>>
>>> Your understanding is correct, and I have sent a new patch [v4] for
>>> addressing the current issue and to use driver quirk mechanism to specify only AMD VAAPI driver has this behavior, then this could be more specific.
>>>
>>> For AMD hardware, it allocates GPU memory internally for the DPB management, the output is always the final one with or without applied film-grain.
>>
>> I don't see why this requires you to write the output to the wrong surface.  Why not write it to the correct one instead?
>
> Indeed, this seems to be a trivial fix in Mesa: <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F19938&amp;data=05%7C01%7Cruijing.dong%40amd.com%7Cff9fa5ba1fdc47293fcb08daccfc817a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638047682135727840%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=cPnxOAo32wz6YY%2Fcm8lMauIaecI%2BXmy0KgtiIB8e4E4%3D&amp;reserved=0>.
>
> It would be helpful if someone with suitable hardware could test that.

This was too naive, the Mesa driver doesn't make this easy.

It is only set up to write to a single surface, which is the one provided to vaBeginPicture().  However, VAAPI does not work that way - it wants you to write to both the pre-grain and the post-grain surfaces, where the pre-grain surface is the primary target and gets passed the vaBeginPicture() and the post-grain surface is supplied in the parameters.

So that's the first problem: the render target which is given as the pre-grain surface needs to be replaced by post-grain surface if we want to only write a single surface.

[rdong] Well, the render surface targets to output the displayable surface, and film grain has many ways to be carried on,  one way is to let application doing next step, the other way is the decoder who can directly output the final applied grain result.  The VAAPI interface should accommodate as much as possible hardware instead of limiting the only way for the implementation. AMD decoder for the performance optimization (tiled formats and swizzle modes) and other considerations, it will need to manage the reference frames internally, and there is no point to implement a new logic to output both pre-grain and post-grain surfaces. If grain_apply is set, then the decoder will only output the applied grain result.

Is that enough?  Well, no.  The Mesa driver is also messing with the reference frames.

The VAAPI model is that the pre-grain surfaces are passed back into the driver on subsequent frames when they are used as reference frames.  However, the Mesa driver has hidden the pre-grain surface internally and only written the post-grain surface.

Therefore, when writing a post-grain output, it magically associates with the target surface information about the pre-grain surface which was written internally at the same time.  Then, when you later give it that surface as a reference frame it ignores the actual content of the frame and looks at the associated data to find what to use internally as the reference.

That's the second problem: if the post-grain surface were actually the render target then the magic internal reference gets associated with that, and when we pass the real reference frame (the pre-grain surface) in later then it won't recognise it because it never wrote to that surface.

[rdong] In my understanding the target surface should be the final output, and the reason AMD decoder cannot output the reference frames I have already described above, in fact, the reference frame buffer pointers are used as reference to indicate how the reference picture buffer will managed internally, please understand there are some concept differences there.

How should it be fixed, then?

The best way would be to stop hiding the internal information about reference frames: if the real reference frames were visible in VAAPI then everything would just work and none of the magic internal references would be needed.

[rdong] Well, it has many difficulties to map the well designed AMD decoder to the VAAPI interface, however we could not expect everything is perfect.

If we suppose that this can't be done (maybe it is hidden behind opaque firmware which the naughty users buying the products are not allowed to see), then Mesa needs two changes:

1.  Write the output to the post-grain surface rather than the pre-grain surface.  This is nontrivial because it isn't the surface passed to vaBeginPicture(), but given the API there isn't really any way around it.

2.  Attach the magic internal reference to the pre-grain surface, /even though it wasn't the one written to/.  This makes the reference frames work, since the pre-grain surfaces will be the ones passed back in later frames.

[rdong] What is the point of passing back and forth the so called pre-grain surfaces as reference frames?

Alternatively: make new API in libva somehow.  Probably wants an attribute which indicates that vaBeginPicture() would want the post-grain surface and then ignore the surfaces in the picture parameters?  Unclear exactly how this should be specified, but whatever it is it needs to be very clear about how the references would work.

Hacking FFmpeg to use the API differently based on matching substrings in the vendor name does not seem like a good approach here, given that future Mesa decode implementations (which could be AMD or could be other hardware) may well be more sensible.  It would also doom other VAAPI users, since you can only really send this sort of hack to a few big projects like FFmpeg.

[rdong] I agree with you on this idea, however as I checked the current substring is in fact only used by AMD driver, if there were some better ideas, we would like to consider them as well.


Thanks,

- Mark
Mark Thompson Nov. 24, 2022, 9:09 p.m. UTC | #8
On 24/11/2022 15:27, Dong, Ruijing wrote:
> [AMD Official Use Only - General]
> 
> I might have misunderstood some of the questions, and I would like to explain more about the issue from my perspective, please correct me if anything wrong.

Are you perhaps missing that this is intended to be a common API between multiple implementations and users?

The important feature is that once the API has specified the behaviour then both the driver implementers (such as Mesa, but also others) and the users (such as FFmpeg, but also others) don't need to know what is going on on the other side, because the other side always behaves in the same way.

In this case, the API specifies that there are two output surfaces - one is written with the reference (pre-grain) frame, one is written with the display (post-grain) frame.

The Mesa driver is incorrect because it writes the display frame to the reference surface, and doesn't bother to write the display surface at all.  This is then very visible when any user tries to display the display surface, because it hasn't been written.

> This patch is NOT a hack, like @Mark Thompson <sw@jkqxz.net> mentioned.

Your proposed patch is trying to crystalise an alternative not-quite-VAAPI which uses the same library and appears compatible, but actually isn't because it differs in critical ways.  I don't think it is unreasonable to characterise this as a hack.

> Video codec, especially decoders will need to meet the requirements of video codecs first, if the reference picture management (DPB) was implemented wrongly,
> then it could not meet the fundamental decoder criteria. From this point of view, different hardware will need to follow the same standard for the implementation
> so that the decoders can generate the conformance outputs.
> 
> The DPB is always an internal part of the decoder, the detail implementation could be differed with different benefits, if DPB is managed by the application, it can be
> more flexible and easily maintained, the other way is the DPB is managed by the driver and hardware itself, it could have more space for the optimization, for
> example the reference frame access, where the format of reference frames is NOT used for display output, and the display output cannot be used for the
> reference frames neither because reference frames could use a different format, which is more efficient for reference access however not good for display.
>  From my point of view VAAPI supports both of the above two ideas, and it is not necessary to force one to follow another, because that is limited by the initial design
> Idea.  In this case, there is no pre-grain output in the former decoder, it has only one display output.

Fair enough.  Luckily, the API clearly states where that display output should be written: <https://github.com/intel/libva/blob/master/va/va_dec_av1.h#L300-L304>.

You can get away with not writing the reference frame because it won't be displayed in this case, but you must write the display frame because that is the one the API says will be displayed to the user.

The unwritten reference frame will still get used in references because that is how the API is defined, but since Mesa deals with this by attaching hidden metadata to the frames there is no problem.

>  From the other side, most of the AMD AV1 decoding issues are resolved from the community, the film grain problem becomes more noticeable. And generally speaking
> it is usually a flexible part of the post processing phase after video decoding, and here it is strictly defined in AV1 spec, and it is part of the decoding standard.
> It is not practical to make changes in the DPB design idea for resolving this issue from AMD decoder side. And naturally output the applied firm grain is just another
> film grain process mode, I called it "direct film grain mode".
> 
> I have asked the community to inspire me to have a better idea, and eventually I found out there is no good way other than to have the external user choice or detecting
> AMD driver.  I understand doing the string match to choose AMD driver is not a perfect idea, but we really need to have a method to resolve this issue. Please let me know
> If a better way come to your mind, which can resolve this issue and in the meanwhile not affecting other AV1 implementation path.

The proper fix is straighforward: correct the error in the Mesa driver so that it writes the display output to the display surface, as the API says it should be.

Thanks,

- Mark
Dong, Ruijing Nov. 24, 2022, 9:22 p.m. UTC | #9
[AMD Official Use Only - General]

Thanks for the input.

I will try this way.

Ruijing

-----Original Message-----
From: Mark Thompson <sw@jkqxz.net>
Sent: Thursday, November 24, 2022 4:10 PM
To: Dong, Ruijing <Ruijing.Dong@amd.com>; FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/av1_vaapi: add direct film grain mode

On 24/11/2022 15:27, Dong, Ruijing wrote:
> [AMD Official Use Only - General]
>
> I might have misunderstood some of the questions, and I would like to explain more about the issue from my perspective, please correct me if anything wrong.

Are you perhaps missing that this is intended to be a common API between multiple implementations and users?

The important feature is that once the API has specified the behaviour then both the driver implementers (such as Mesa, but also others) and the users (such as FFmpeg, but also others) don't need to know what is going on on the other side, because the other side always behaves in the same way.

In this case, the API specifies that there are two output surfaces - one is written with the reference (pre-grain) frame, one is written with the display (post-grain) frame.

The Mesa driver is incorrect because it writes the display frame to the reference surface, and doesn't bother to write the display surface at all.  This is then very visible when any user tries to display the display surface, because it hasn't been written.

> This patch is NOT a hack, like @Mark Thompson <sw@jkqxz.net> mentioned.

Your proposed patch is trying to crystalise an alternative not-quite-VAAPI which uses the same library and appears compatible, but actually isn't because it differs in critical ways.  I don't think it is unreasonable to characterise this as a hack.

> Video codec, especially decoders will need to meet the requirements of
> video codecs first, if the reference picture management (DPB) was
> implemented wrongly, then it could not meet the fundamental decoder criteria. From this point of view, different hardware will need to follow the same standard for the implementation so that the decoders can generate the conformance outputs.
>
> The DPB is always an internal part of the decoder, the detail
> implementation could be differed with different benefits, if DPB is
> managed by the application, it can be more flexible and easily
> maintained, the other way is the DPB is managed by the driver and hardware itself, it could have more space for the optimization, for example the reference frame access, where the format of reference frames is NOT used for display output, and the display output cannot be used for the reference frames neither because reference frames could use a different format, which is more efficient for reference access however not good for display.
>  From my point of view VAAPI supports both of the above two ideas, and
> it is not necessary to force one to follow another, because that is limited by the initial design Idea.  In this case, there is no pre-grain output in the former decoder, it has only one display output.

Fair enough.  Luckily, the API clearly states where that display output should be written: <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fintel%2Flibva%2Fblob%2Fmaster%2Fva%2Fva_dec_av1.h%23L300-L304&amp;data=05%7C01%7CRuijing.Dong%40amd.com%7C632f4b36198b472f479e08dace604aaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638049210211836459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vDQ87%2BLUym3W7UsO6JJI6o9uGLAxCenCA7K2eRQHtgI%3D&amp;reserved=0>.

You can get away with not writing the reference frame because it won't be displayed in this case, but you must write the display frame because that is the one the API says will be displayed to the user.

The unwritten reference frame will still get used in references because that is how the API is defined, but since Mesa deals with this by attaching hidden metadata to the frames there is no problem.

>  From the other side, most of the AMD AV1 decoding issues are resolved
> from the community, the film grain problem becomes more noticeable. And generally speaking it is usually a flexible part of the post processing phase after video decoding, and here it is strictly defined in AV1 spec, and it is part of the decoding standard.
> It is not practical to make changes in the DPB design idea for
> resolving this issue from AMD decoder side. And naturally output the applied firm grain is just another film grain process mode, I called it "direct film grain mode".
>
> I have asked the community to inspire me to have a better idea, and
> eventually I found out there is no good way other than to have the
> external user choice or detecting AMD driver.  I understand doing the string match to choose AMD driver is not a perfect idea, but we really need to have a method to resolve this issue. Please let me know If a better way come to your mind, which can resolve this issue and in the meanwhile not affecting other AV1 implementation path.

The proper fix is straighforward: correct the error in the Mesa driver so that it writes the display output to the display surface, as the API says it should be.

Thanks,

- Mark
diff mbox series

Patch

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 3edd8e2636..9420e7820d 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -2253,6 +2253,23 @@  typedef struct AVHWAccel {
  */
 #define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)

+/**
+ * The film grain synthesis could be seperate from decoding process.
+ * The downstream device would apply the film grain parameters seperately.
+ * The desired film grain parameters can be found in SEI section in H264
+ * or H265 bitstreams.
+ *
+ * In AV1, film grain is mandatorily specified, av1 decoders like AMD
+ * av1 decoder process film grain content internally, and the output
+ * includes applied film grain. For the purpose of supporting these av1
+ * decoders, this flag needs to be utilized.
+ *
+ * @warning If the stream has no film grain content, this flag will
+ *          be ignored in the supported av1 decoders. It is advised
+ *          that this flag should only be used in av1 decoders
+ *          that support it.
+ */
+#define AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN (1 << 3)
 /**
  * @}
  */
diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
index cd02f5096f..0302f89280 100644
--- a/libavcodec/options_table.h
+++ b/libavcodec/options_table.h
@@ -399,6 +399,7 @@  static const AVOption avcodec_options[] = {
 {"ignore_level", "ignore level even if the codec level used is unknown or higher than the maximum supported level reported by the hardware driver", 0, AV_OPT_TYPE_CONST, { .i64 = AV_HWACCEL_FLAG_IGNORE_LEVEL }, INT_MIN, INT_MAX, V | D, "hwaccel_flags" },
 {"allow_high_depth", "allow to output YUV pixel formats with a different chroma sampling than 4:2:0 and/or other than 8 bits per component", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
 {"allow_profile_mismatch", "attempt to decode anyway if HW accelerated decoder's supported profiles do not exactly match the stream", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
+{"direct_film_grain", "allow decoder to directly apply film grain to the output", 0, AV_OPT_TYPE_CONST, {.i64 = AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN }, INT_MIN, INT_MAX, V | D, "hwaccel_flags"},
 {"extra_hw_frames", "Number of extra hardware frames to allocate for the user", OFFSET(extra_hw_frames), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, V|D },
 {"discard_damaged_percentage", "Percentage of damaged samples to discard a frame", OFFSET(discard_damaged_percentage), AV_OPT_TYPE_INT, {.i64 = 95 }, 0, 100, V|D },
 {NULL},
diff --git a/libavcodec/vaapi_av1.c b/libavcodec/vaapi_av1.c
index d0339b2705..6db910f2bf 100644
--- a/libavcodec/vaapi_av1.c
+++ b/libavcodec/vaapi_av1.c
@@ -127,6 +127,7 @@  static int vaapi_av1_start_frame(AVCodecContext *avctx,
     int8_t bit_depth_idx;
     int err = 0;
     int apply_grain = !(avctx->export_side_data & AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
+    int direct_film_grain = avctx->hwaccel_flags & AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
     uint8_t remap_lr_type[4] = {AV1_RESTORE_NONE, AV1_RESTORE_SWITCHABLE, AV1_RESTORE_WIENER, AV1_RESTORE_SGRPROJ};
     uint8_t segmentation_feature_signed[AV1_SEG_LVL_MAX] = {1, 1, 1, 1, 1, 0, 0, 0};
     uint8_t segmentation_feature_max[AV1_SEG_LVL_MAX] = {255, AV1_MAX_LOOP_FILTER,
@@ -136,7 +137,7 @@  static int vaapi_av1_start_frame(AVCodecContext *avctx,
     if (bit_depth_idx < 0)
         goto fail;

-    if (apply_grain) {
+    if (apply_grain && !direct_film_grain) {
         if (ctx->tmp_frame->buf[0])
             ff_thread_release_buffer(avctx, ctx->tmp_frame);
         err = ff_thread_get_buffer(avctx, ctx->tmp_frame, AV_GET_BUFFER_FLAG_REF);
@@ -375,6 +376,7 @@  static int vaapi_av1_end_frame(AVCodecContext *avctx)
     VAAPIAV1DecContext *ctx = avctx->internal->hwaccel_priv_data;

     int apply_grain = !(avctx->export_side_data & AV_CODEC_EXPORT_DATA_FILM_GRAIN) && film_grain->apply_grain;
+    int direct_film_grain = avctx->hwaccel_flags & AV_HWACCEL_FLAG_DIRECT_FILM_GRAIN;
     int ret;
     ret = ff_vaapi_decode_issue(avctx, pic);
     if (ret < 0)
@@ -385,7 +387,7 @@  static int vaapi_av1_end_frame(AVCodecContext *avctx)
             if (ctx->ref_tab[i].frame->buf[0])
                 ff_thread_release_buffer(avctx, ctx->ref_tab[i].frame);

-            if (apply_grain) {
+            if (apply_grain && !direct_film_grain) {
                 ret = av_frame_ref(ctx->ref_tab[i].frame, ctx->tmp_frame);
                 if (ret < 0)
                     return ret;