Message ID | 20240319191642.95217-1-ffmpeg@haasn.xyz |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/4] fftools/ffmpeg_enc: strip DOVI config record for AV1 | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
Quoting Niklas Haas (2024-03-19 20:16:39) > From: Niklas Haas <git@haasn.dev> > > AV1 streams don't use configuration records, so delete them when > encoding to AV1. Ideally this would be, as the comment suggests, handled > at the frame-level (and stripped by the av1 encoder), but given the > status quo of copying the packet-level data here directly, we should > definitely make an effort to strip it. > --- > fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) I'm very much not a fan of having codec-specific code in ffmpeg CLI. It implies that every single caller must now be aware of this (undocumented?) interaction of this specific side data with this specific codec ID.
On Thu, 21 Mar 2024 11:16:57 +0100 Anton Khirnov <anton@khirnov.net> wrote: > Quoting Niklas Haas (2024-03-19 20:16:39) > > From: Niklas Haas <git@haasn.dev> > > > > AV1 streams don't use configuration records, so delete them when > > encoding to AV1. Ideally this would be, as the comment suggests, handled > > at the frame-level (and stripped by the av1 encoder), but given the > > status quo of copying the packet-level data here directly, we should > > definitely make an effort to strip it. > > --- > > fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- > > 1 file changed, 14 insertions(+), 11 deletions(-) > > I'm very much not a fan of having codec-specific code in ffmpeg CLI. It > implies that every single caller must now be aware of this > (undocumented?) interaction of this specific side data with this > specific codec ID. Note: This is an existing bug, not introduced by this series. This series just makes it obvious. The status quo is that, beacuse of this logic in ffmpeg_enc.c, we incorrectly forward dolby vision configuration records when transcoding to AV1. Or, indeed, when transcoding to *any* format - since current FFmpeg also does not propagate dolby vision RPUs, we generate broken files pretty much always when transcoding dolby vision. So we definitely need to strip the metadata from the stream muxer *somewhere*. Where else comes to mind? This also gets into another topic I wanted to touch on, which is that the presence of dynamic dolby vision metadata currently hinders the ability of libavfilter to treat the video primaries/gamma as a negotiable colorspace property (the way it is done currently for YUV matrix/range). This is because when interpreted as such, DV metadata fundamentally changes the colorspace of the incoming video stream. Ideally we would like some way to negotiate DV metadata on the query_formats() level. Ideally, we'd want something like AVCOL_SPC_DOLBYVISION, but we can't easily introduce that without breaking ISO/IEC 23091 compatibility.. ------- P.s. I accidentally sent a duplicate of this email from a different (wrong) mail address, please ignore.
Quoting Niklas Haas (2024-03-21 13:11:32) > On Thu, 21 Mar 2024 11:16:57 +0100 Anton Khirnov <anton@khirnov.net> wrote: > > Quoting Niklas Haas (2024-03-19 20:16:39) > > > From: Niklas Haas <git@haasn.dev> > > > > > > AV1 streams don't use configuration records, so delete them when > > > encoding to AV1. Ideally this would be, as the comment suggests, handled > > > at the frame-level (and stripped by the av1 encoder), but given the > > > status quo of copying the packet-level data here directly, we should > > > definitely make an effort to strip it. > > > --- > > > fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- > > > 1 file changed, 14 insertions(+), 11 deletions(-) > > > > I'm very much not a fan of having codec-specific code in ffmpeg CLI. It > > implies that every single caller must now be aware of this > > (undocumented?) interaction of this specific side data with this > > specific codec ID. > > Note: This is an existing bug, not introduced by this series. This > series just makes it obvious. The status quo is that, beacuse of this > logic in ffmpeg_enc.c, we incorrectly forward dolby vision configuration > records when transcoding to AV1. I know pretty much nothing about dolby vision, so could you please explain why precisely is this incorrect? And at what point in the transcoding chain does the side data become invalid? > Or, indeed, when transcoding to *any* format - since current FFmpeg also > does not propagate dolby vision RPUs, we generate broken files pretty > much always when transcoding dolby vision. So we definitely need to > strip the metadata from the stream muxer *somewhere*. Where else comes > to mind? > > This also gets into another topic I wanted to touch on, which is that > the presence of dynamic dolby vision metadata currently hinders the > ability of libavfilter to treat the video primaries/gamma as > a negotiable colorspace property (the way it is done currently for YUV > matrix/range). This is because when interpreted as such, DV metadata > fundamentally changes the colorspace of the incoming video stream. > Ideally we would like some way to negotiate DV metadata on the > query_formats() level. > > Ideally, we'd want something like AVCOL_SPC_DOLBYVISION, but we can't > easily introduce that without breaking ISO/IEC 23091 compatibility.. In principle it could be yet another negotiated field, could it not? You just added a bunch of those recently, what's another one?
On Fri, 22 Mar 2024 10:41:13 +0100 Anton Khirnov <anton@khirnov.net> wrote: > Quoting Niklas Haas (2024-03-21 13:11:32) > > On Thu, 21 Mar 2024 11:16:57 +0100 Anton Khirnov <anton@khirnov.net> wrote: > > > Quoting Niklas Haas (2024-03-19 20:16:39) > > > > From: Niklas Haas <git@haasn.dev> > > > > > > > > AV1 streams don't use configuration records, so delete them when > > > > encoding to AV1. Ideally this would be, as the comment suggests, handled > > > > at the frame-level (and stripped by the av1 encoder), but given the > > > > status quo of copying the packet-level data here directly, we should > > > > definitely make an effort to strip it. > > > > --- > > > > fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- > > > > 1 file changed, 14 insertions(+), 11 deletions(-) > > > > > > I'm very much not a fan of having codec-specific code in ffmpeg CLI. It > > > implies that every single caller must now be aware of this > > > (undocumented?) interaction of this specific side data with this > > > specific codec ID. > > > > Note: This is an existing bug, not introduced by this series. This > > series just makes it obvious. The status quo is that, beacuse of this > > logic in ffmpeg_enc.c, we incorrectly forward dolby vision configuration > > records when transcoding to AV1. > > I know pretty much nothing about dolby vision, so could you please > explain why precisely is this incorrect? And at what point in the > transcoding chain does the side data become invalid? Dolby Vision basically consists of two separate pieces of metadata: 1. The (per-stream) configuration struct, AV_PKT_DATA_DOVI_CONF 2. The per-frame structs (RPUs), AV_FRAME_DATA_DOVI_METADATA (ditto AV_FRAME_DATA_DOVI_RPU_BUFFER, which is the same) A valid HEVC dolby vision file should contain both - the configuration struct tells the decoder that hey, this file is dolby vision (and what profile to expect, whether there's an enhancement layer, etc.). The RPUs contain the actual DV-specific details of how each frame is encoded. A valid AV1 dolby vision file, on the other hand, only uses the per-frame RPUs, it does not have a configuration struct at all. The current logic in ffmpeg_enc.c copies over all stream-level metadata, including the DOVI_CONF struct, to the output file. This generates a stream which is *marked* as being Dolby Vision, but in which none of the frames actually contain DV RPUs. This *probably* violates some spec somewhere, and at the very least is not desirable behavior. (And for AV1, the configuration struct's presence is definitely a no-go) Basically, we want to handle all of these scenarios: 1. When transcoding DV HEVC (profile 8) to DV AV1 (profile 10), we need to strip the configuration struct somewhere 2. When transcoding DV HEVC (profile 8) to HEVC, we need to strip the configuration struct IFF we're also stripping the per-frame RPUs (e.g. as a result of filtering). 3. When transcoding DV AV1 (profile 10) to HEVC, we need to *synthesize* a configuration struct containing the correct values. I think the best way forward for now is: 1. Always strip the dovi configuration record when transcoding 2. Have the encoder generate (and attach to avctx.coded_side_data) the correct configuration record. I will write a patch for #2. > > Or, indeed, when transcoding to *any* format - since current FFmpeg also > > does not propagate dolby vision RPUs, we generate broken files pretty > > much always when transcoding dolby vision. So we definitely need to > > strip the metadata from the stream muxer *somewhere*. Where else comes > > to mind? > > > > This also gets into another topic I wanted to touch on, which is that > > the presence of dynamic dolby vision metadata currently hinders the > > ability of libavfilter to treat the video primaries/gamma as > > a negotiable colorspace property (the way it is done currently for YUV > > matrix/range). This is because when interpreted as such, DV metadata > > fundamentally changes the colorspace of the incoming video stream. > > Ideally we would like some way to negotiate DV metadata on the > > query_formats() level. > > > > Ideally, we'd want something like AVCOL_SPC_DOLBYVISION, but we can't > > easily introduce that without breaking ISO/IEC 23091 compatibility.. > > In principle it could be yet another negotiated field, could it not? You > just added a bunch of those recently, what's another one? Adding more fields to this negotiation process is a very obnoxious and tedious process, with lots of boilerplate for each new field added. Maybe we can come up with some better mechanism first?
On Fri, 22 Mar 2024 14:08:07 +0100 Niklas Haas <ffmpeg@haasn.xyz> wrote: > On Fri, 22 Mar 2024 10:41:13 +0100 Anton Khirnov <anton@khirnov.net> wrote: > > Quoting Niklas Haas (2024-03-21 13:11:32) > > > On Thu, 21 Mar 2024 11:16:57 +0100 Anton Khirnov <anton@khirnov.net> wrote: > > > > Quoting Niklas Haas (2024-03-19 20:16:39) > > > > > From: Niklas Haas <git@haasn.dev> > > > > > > > > > > AV1 streams don't use configuration records, so delete them when > > > > > encoding to AV1. Ideally this would be, as the comment suggests, handled > > > > > at the frame-level (and stripped by the av1 encoder), but given the > > > > > status quo of copying the packet-level data here directly, we should > > > > > definitely make an effort to strip it. > > > > > --- > > > > > fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- > > > > > 1 file changed, 14 insertions(+), 11 deletions(-) > > > > > > > > I'm very much not a fan of having codec-specific code in ffmpeg CLI. It > > > > implies that every single caller must now be aware of this > > > > (undocumented?) interaction of this specific side data with this > > > > specific codec ID. > > > > > > Note: This is an existing bug, not introduced by this series. This > > > series just makes it obvious. The status quo is that, beacuse of this > > > logic in ffmpeg_enc.c, we incorrectly forward dolby vision configuration > > > records when transcoding to AV1. > > > > I know pretty much nothing about dolby vision, so could you please > > explain why precisely is this incorrect? And at what point in the > > transcoding chain does the side data become invalid? > > Dolby Vision basically consists of two separate pieces of metadata: > > 1. The (per-stream) configuration struct, AV_PKT_DATA_DOVI_CONF > 2. The per-frame structs (RPUs), AV_FRAME_DATA_DOVI_METADATA > (ditto AV_FRAME_DATA_DOVI_RPU_BUFFER, which is the same) > > A valid HEVC dolby vision file should contain both - the configuration > struct tells the decoder that hey, this file is dolby vision (and what > profile to expect, whether there's an enhancement layer, etc.). The RPUs > contain the actual DV-specific details of how each frame is encoded. > > A valid AV1 dolby vision file, on the other hand, only uses the > per-frame RPUs, it does not have a configuration struct at all. Correction: AV1 still uses configuration records, I was not looking at spec-compliant sample files. So at least, this difference evaporates. > The current logic in ffmpeg_enc.c copies over all stream-level metadata, > including the DOVI_CONF struct, to the output file. This generates > a stream which is *marked* as being Dolby Vision, but in which none of > the frames actually contain DV RPUs. This *probably* violates some spec > somewhere, and at the very least is not desirable behavior. (And for > AV1, the configuration struct's presence is definitely a no-go) > > Basically, we want to handle all of these scenarios: > > 1. When transcoding DV HEVC (profile 8) to DV AV1 (profile 10), we need > to strip the configuration struct somewhere > 2. When transcoding DV HEVC (profile 8) to HEVC, we need to strip the > configuration struct IFF we're also stripping the per-frame RPUs > (e.g. as a result of filtering). > 3. When transcoding DV AV1 (profile 10) to HEVC, we need to *synthesize* > a configuration struct containing the correct values. > > I think the best way forward for now is: > > 1. Always strip the dovi configuration record when transcoding > 2. Have the encoder generate (and attach to avctx.coded_side_data) the > correct configuration record. As discussed on IRC, removing this logic entirely will cover the first point. > I will write a patch for #2. > > > > Or, indeed, when transcoding to *any* format - since current FFmpeg also > > > does not propagate dolby vision RPUs, we generate broken files pretty > > > much always when transcoding dolby vision. So we definitely need to > > > strip the metadata from the stream muxer *somewhere*. Where else comes > > > to mind? > > > > > > This also gets into another topic I wanted to touch on, which is that > > > the presence of dynamic dolby vision metadata currently hinders the > > > ability of libavfilter to treat the video primaries/gamma as > > > a negotiable colorspace property (the way it is done currently for YUV > > > matrix/range). This is because when interpreted as such, DV metadata > > > fundamentally changes the colorspace of the incoming video stream. > > > Ideally we would like some way to negotiate DV metadata on the > > > query_formats() level. > > > > > > Ideally, we'd want something like AVCOL_SPC_DOLBYVISION, but we can't > > > easily introduce that without breaking ISO/IEC 23091 compatibility.. > > > > In principle it could be yet another negotiated field, could it not? You > > just added a bunch of those recently, what's another one? > > Adding more fields to this negotiation process is a very obnoxious and > tedious process, with lots of boilerplate for each new field added. > Maybe we can come up with some better mechanism first? Jan Ekström pointed out to me that the newest iteration of H.273 contains markers for dolby vision, so we can support this in the obvious way (at least for profile 5). Getting the colorspace selection exactly right for profiles 8.1/8.4 is not so important. That leaves only profile 8.2 as the annoying special case, but it's a rarely used profile so maybe we can just ignore it. The downside of not having negotiation here really boils down to just -vf libplacebo doing tone-mapping to SDR where it may perhaps not be expected.
On Tue, 19 Mar 2024 20:16:39 +0100 Niklas Haas <ffmpeg@haasn.xyz> wrote: > From: Niklas Haas <git@haasn.dev> > > AV1 streams don't use configuration records, so delete them when > encoding to AV1. Ideally this would be, as the comment suggests, handled > at the frame-level (and stripped by the av1 encoder), but given the > status quo of copying the packet-level data here directly, we should > definitely make an effort to strip it. I decided to abandon this series in favor of synthesizing the RPU fully from the parsed AVDOVIMetadata side data. I have a series for this prepared and mostly ready, will submit it sometime in the next week. That series also includes a new option to avoid sending RPU data when dolby vision is not enabled.
diff --git a/fftools/ffmpeg_enc.c b/fftools/ffmpeg_enc.c index c9a12af1393..0c21acfafc6 100644 --- a/fftools/ffmpeg_enc.c +++ b/fftools/ffmpeg_enc.c @@ -354,17 +354,20 @@ int enc_open(void *opaque, const AVFrame *frame) */ if (ist) { for (int i = 0; i < ist->st->codecpar->nb_coded_side_data; i++) { - AVPacketSideData *sd_src = &ist->st->codecpar->coded_side_data[i]; - if (sd_src->type != AV_PKT_DATA_CPB_PROPERTIES) { - AVPacketSideData *sd_dst = av_packet_side_data_new(&ost->par_in->coded_side_data, - &ost->par_in->nb_coded_side_data, - sd_src->type, sd_src->size, 0); - if (!sd_dst) - return AVERROR(ENOMEM); - memcpy(sd_dst->data, sd_src->data, sd_src->size); - if (ist->autorotate && sd_src->type == AV_PKT_DATA_DISPLAYMATRIX) - av_display_rotation_set((int32_t *)sd_dst->data, 0); - } + AVPacketSideData *sd_src, *sd_dst; + sd_src = &ist->st->codecpar->coded_side_data[i]; + if (sd_src->type == AV_PKT_DATA_CPB_PROPERTIES) + continue; + if (sd_src->type == AV_PKT_DATA_DOVI_CONF && enc->id == AV_CODEC_ID_AV1) + continue; /* AV1 doesn't use DOVI configuration records */ + sd_dst = av_packet_side_data_new(&ost->par_in->coded_side_data, + &ost->par_in->nb_coded_side_data, + sd_src->type, sd_src->size, 0); + if (!sd_dst) + return AVERROR(ENOMEM); + memcpy(sd_dst->data, sd_src->data, sd_src->size); + if (ist->autorotate && sd_src->type == AV_PKT_DATA_DISPLAYMATRIX) + av_display_rotation_set((int32_t *)sd_dst->data, 0); } }
From: Niklas Haas <git@haasn.dev> AV1 streams don't use configuration records, so delete them when encoding to AV1. Ideally this would be, as the comment suggests, handled at the frame-level (and stripped by the av1 encoder), but given the status quo of copying the packet-level data here directly, we should definitely make an effort to strip it. --- fftools/ffmpeg_enc.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)