mbox series

[FFmpeg-devel,0/6] Implement SEI parsing for QSV decoders

Message ID pull.31.ffstaging.FFmpeg.1653552529.ffmpegagent@gmail.com
Headers show
Series Implement SEI parsing for QSV decoders | expand

Message

Aman Karmani May 26, 2022, 8:08 a.m. UTC
Missing SEI information has always been a major drawback when using the QSV
decoders. I used to think that there's no chance to get at the data without
explicit implementation from the MSDK side (or doing something weird like
parsing in parallel). It turned out that there's a hardly known api method
that provides access to all SEI (h264/hevc) or user data (mpeg2video).

This allows to get things like closed captions, frame packing, display
orientation, HDR data (mastering display, content light level, etc.) without
having to rely on those data being provided by the MSDK as extended buffers.

The commit "Implement SEI parsing for QSV decoders" includes some hard-coded
workarounds for MSDK bugs which I reported:
https://github.com/Intel-Media-SDK/MediaSDK/issues/2597#issuecomment-1072795311

But that doesn't help. Those bugs exist and I'm sharing my workarounds,
which are empirically determined by testing a range of files. If someone is
interested, I can provide private access to a repository where we have been
testing this. Alternatively, I could also leave those workarounds out, and
just skip those SEI types.

In a previous version of this patchset, there was a concern that payload
data might need to be re-ordered. Meanwhile I have researched this carefully
and the conclusion is that this is not required.

My detailed analysis can be found here:
https://gist.github.com/softworkz/36c49586a8610813a32270ee3947a932

softworkz (6):
  avutil/frame: Add av_frame_copy_side_data() and
    av_frame_remove_all_side_data()
  avcodec/vpp_qsv: Copy side data from input to output frame
  avcodec/mpeg12dec: make mpeg_decode_user_data() accessible
  avcodec/hevcdec: make set_side_data() accessible
  avcodec/h264dec: make h264_export_frame_props() accessible
  avcodec/qsvdec: Implement SEI parsing for QSV decoders

 doc/APIchanges               |   4 +
 libavcodec/h264_slice.c      |  98 +++++++--------
 libavcodec/h264dec.h         |   2 +
 libavcodec/hevcdec.c         | 117 +++++++++---------
 libavcodec/hevcdec.h         |   2 +
 libavcodec/mpeg12.h          |  28 +++++
 libavcodec/mpeg12dec.c       |  40 +-----
 libavcodec/qsvdec.c          | 233 +++++++++++++++++++++++++++++++++++
 libavfilter/qsvvpp.c         |   6 +
 libavfilter/vf_overlay_qsv.c |  19 ++-
 libavutil/frame.c            |  67 ++++++----
 libavutil/frame.h            |  32 +++++
 libavutil/version.h          |   2 +-
 13 files changed, 477 insertions(+), 173 deletions(-)


base-commit: b033913d1c5998a29dfd13e9906dd707ff6eff12
Published-As: https://github.com/ffstaging/FFmpeg/releases/tag/pr-ffstaging-31%2Fsoftworkz%2Fsubmit_qsv_sei-v1
Fetch-It-Via: git fetch https://github.com/ffstaging/FFmpeg pr-ffstaging-31/softworkz/submit_qsv_sei-v1
Pull-Request: https://github.com/ffstaging/FFmpeg/pull/31

Comments

Kieran Kunhya June 1, 2022, 7:15 p.m. UTC | #1
On Thu, 26 May 2022 at 09:09, ffmpegagent <ffmpegagent@gmail.com> wrote:

> But that doesn't help. Those bugs exist and I'm sharing my workarounds,
> which are empirically determined by testing a range of files. If someone is
> interested, I can provide private access to a repository where we have been
> testing this. Alternatively, I could also leave those workarounds out, and
> just skip those SEI types.
>

I don't care much for QSV but I would say b-frame reordering heuristics
like the one you are using may not necessarily catch all the possible
structures in the wild from third-party encoders.
VLC had (has?) heuristics for this which would cause captions to not be
frame accurate.

Kieran
Soft Works June 1, 2022, 7:46 p.m. UTC | #2
From: Kieran Kunhya <kierank@obe.tv>
Sent: Wednesday, June 1, 2022 9:16 PM
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: softworkz <softworkz@hotmail.com>
Subject: Re: [FFmpeg-devel] [PATCH 0/6] Implement SEI parsing for QSV decoders

On Thu, 26 May 2022 at 09:09, ffmpegagent <ffmpegagent@gmail.com<mailto:ffmpegagent@gmail.com>> wrote:
But that doesn't help. Those bugs exist and I'm sharing my workarounds,
which are empirically determined by testing a range of files. If someone is
interested, I can provide private access to a repository where we have been
testing this. Alternatively, I could also leave those workarounds out, and
just skip those SEI types.

I don't care much for QSV but I would say b-frame reordering heuristics like the one you are using may not necessarily catch all the possible structures in the wild from third-party encoders.

I am not using any b-frame reordering heuristics, I just take the payloads in the order
in which MSDK(QSV) provides it, and it has turned out that they are reordering
the data according to the display order.

I did some detailed analysis of files with out-of-order B-frames:

https://gist.github.com/softworkz/36c49586a8610813a32270ee3947a932

Did you take a look?


VLC had (has?) heuristics for this which would cause captions to not be frame accurate.

Captions aren’t exactly “frame accurate” anyway as each frame has just a very small piece
of information and only when a certain sequence is complete, it leads to some new letters
or line being ready for display.

But out-of-order would screw this definitely, but I haven’t seen any such cases.
The code I’m submitting has been in testing for quite a while with a bunch of users and
many files and TV streams with MP2Video, H264 and HEVC were tested.

You could still be right, that there is a case. A while ago I had digged through
https://streams.videolan.org/ and downloaded all samples that seemed to have CC,
but maybe I missed one for the case you’re talking about.

Do you have an idea where/how I could find such stream?

Thanks,
softworkz
Kieran Kunhya June 1, 2022, 8:25 p.m. UTC | #3
>
> Captions aren’t exactly “frame accurate” anyway as each frame has just a
> very small piece
>
> of information and only when a certain sequence is complete, it leads to
> some new letters
>
> or line being ready for display.
>

In many use-cases, you want them to be frame-accurate. Final rendition to
the viewer is only one use-case of FFmpeg (to be fair, the likely use-case
of QSV)
And you don't want errors accumulating across encode cycles.


> Do you have an idea where/how I could find such stream?
>

You would need to record them off various television services as they use a
wide range of encoder manufacturers.
Ideally, you could also inject them into various encoders and force them to
build complex reordering patterns and check they are frame accurate.

Regards,
Kieran

>
Soft Works June 1, 2022, 9:24 p.m. UTC | #4
From: Kieran Kunhya <kierank@obe.tv>
Sent: Wednesday, June 1, 2022 10:26 PM
To: Soft Works <softworkz@hotmail.com>
Cc: Kieran Kunhya <kierank@obe.tv>; FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH 0/6] Implement SEI parsing for QSV decoders

Captions aren’t exactly “frame accurate” anyway as each frame has just a very small piece
of information and only when a certain sequence is complete, it leads to some new letters
or line being ready for display.
And you don't want errors accumulating across encode cycles.

No such errors seen at any time, CC extraction behavior doesn’t appear to be any different
than for other ffmpeg decoders (mp2, h264, hevc – sw, same three for NVDEC, same three
for VAAPI, same three for D3D11VA)


In many use-cases, you want them to be frame-accurate. Final rendition to the viewer is only one use-case of FFmpeg (to be fair, the likely use-case of QSV)


What this patchset provides for CC is:


  *   QSV decoders create and attach CC side data to AVFrames
  *   QSV filters preserve CC side data from input to output
  *   in an earlier patch we already added the ability to attach CC data when using QSV encoders

This allows for example to do QSV hw decoding, filtering and encoding where CC data
is preserved in the output video .

In combination with my Subtitle Filtering patchset, you can do almost anything you
like with closed captions.

You can work with them like any other subtitle data and process them with the new
filters, e.g. manipulate the text content, change styles and appearance, like font
sizes, colors, outlines, background etc.

Then you can burn-in these into the video, or encode in an arbitrary text subtitle
format. There are many possibilities…
Just one example:

[cid:image001.png@01D8760E.B18F06F0]

Do you have an idea where/how I could find such stream?

You would need to record them off various television services as they use a wide range of encoder manufacturers.
Ideally, you could also inject them into various encoders and force them to build complex reordering patterns and check they are frame accurate.

We have users from all over the US with different tuners and on different networks,
there has never been an issue.

What I meant is the “VLC heuristics” subject you mentioned whether you might
have some pointer to a  commit, issue, bug report or simply a name how they call this?

Thanks again,
sw