Message ID | 20210714143401.890-1-jamrial@gmail.com |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel,1/3] avcodec: add a parser flag to enable keyframe tagging heuristics | expand |
Context | Check | Description |
---|---|---|
andriy/x86_make | success | Make finished |
andriy/x86_make_fate | success | Make fate finished |
andriy/PPC64_make | success | Make finished |
andriy/PPC64_make_fate | success | Make fate finished |
On Wed, Jul 14, 2021 at 11:33:59AM -0300, James Almer wrote: > It will be used to allow parsers to be more liberal when tagging packets as > keyframes. > > Signed-off-by: James Almer <jamrial@gmail.com> > --- > libavcodec/avcodec.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h > index 8b97895aeb..8e3d888266 100644 > --- a/libavcodec/avcodec.h > +++ b/libavcodec/avcodec.h > @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { > #define PARSER_FLAG_ONCE 0x0002 > /// Set if the parser has a valid file offset > #define PARSER_FLAG_FETCHED_OFFSET 0x0004 > +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 > #define PARSER_FLAG_USE_CODEC_TS 0x1000 This doesnt "feel" like the best solution to me dont you think it would be better to export all information ? the concept of a keyframe is a point at which decoding can begin that really are at least 3 points the point at which packets begin to be input into the decoder the point at which the decoder is able to return some decoded data which closely resembles the encoder input and the point at which the decoder output matches 1:1 the output of a decoder starting from frame 0 Thanks [...]
On 7/15/2021 5:23 PM, Michael Niedermayer wrote: > On Wed, Jul 14, 2021 at 11:33:59AM -0300, James Almer wrote: >> It will be used to allow parsers to be more liberal when tagging packets as >> keyframes. >> >> Signed-off-by: James Almer <jamrial@gmail.com> >> --- >> libavcodec/avcodec.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h >> index 8b97895aeb..8e3d888266 100644 >> --- a/libavcodec/avcodec.h >> +++ b/libavcodec/avcodec.h >> @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { >> #define PARSER_FLAG_ONCE 0x0002 >> /// Set if the parser has a valid file offset >> #define PARSER_FLAG_FETCHED_OFFSET 0x0004 >> +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 >> #define PARSER_FLAG_USE_CODEC_TS 0x1000 > > This doesnt "feel" like the best solution to me > > dont you think it would be better to export all information ? The AVParser API is going to be removed at some point for something better that works on packets rather than raw buffers, so ideally we should not expand it too much, and leave more complex implementations for later. Adding a flag is the simplest way to fix this for now, until a proper rework is done. > > the concept of a keyframe is a point at which decoding can begin > that really are at least 3 points > > the point at which packets begin to be input into the decoder > > the point at which the decoder is able to return some decoded > data which closely resembles the encoder input > > and the point at which the decoder output matches 1:1 the output > of a decoder starting from frame 0 All parsers save for h264 are currently only tagging packets containing a coded bitstream that, when decoded, it fully resets the decoding state and depends on no previously parsed data or state, which is what (most) muxers expect. This approach here is making the h264 do the same by default (in line with the decoder), to ensure some muxers don't wrongly mark certain packets as sync samples, while letting others remain liberal about it. Ideally yes, we'd propagate more detailed information in some way, which then the mp4 muxer can use to build sample groups for h264, hevc and av1. But the existing AVParser API is not adequate for that. > > > > Thanks > > [...] > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Thu, Jul 15, 2021 at 05:55:51PM -0300, James Almer wrote: > On 7/15/2021 5:23 PM, Michael Niedermayer wrote: > > On Wed, Jul 14, 2021 at 11:33:59AM -0300, James Almer wrote: > > > It will be used to allow parsers to be more liberal when tagging packets as > > > keyframes. > > > > > > Signed-off-by: James Almer <jamrial@gmail.com> > > > --- > > > libavcodec/avcodec.h | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h > > > index 8b97895aeb..8e3d888266 100644 > > > --- a/libavcodec/avcodec.h > > > +++ b/libavcodec/avcodec.h > > > @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { > > > #define PARSER_FLAG_ONCE 0x0002 > > > /// Set if the parser has a valid file offset > > > #define PARSER_FLAG_FETCHED_OFFSET 0x0004 > > > +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 > > > #define PARSER_FLAG_USE_CODEC_TS 0x1000 > > > > This doesnt "feel" like the best solution to me > > > > dont you think it would be better to export all information ? > > The AVParser API is going to be removed at some point for something better > that works on packets rather than raw buffers, so ideally we should not > expand it too much, and leave more complex implementations for later. Iam not sure i follow your logic here do you suggest to block setting lets call it "advanced keyframe" related work across the codebase ? (that includes muxers and so on ? Iam not saying thats a good or bad idea. Just trying to understand Because to me it seems if the awnser is no to the question above then this is not just about "Adding a flag is the simplest way to fix this" but its adding codec specific code in muxers probably to extract the exta information, and iam not sure that at this point its not easier to have the parser export this information And when the parser API is then replaced by "something better" the interface changes less. If this heuristic flag is added then i assume it needs to be removed and reimplemented with the API change. I very likely am missing something But i certainly do not suggest or ask for more work to be done. Rather the opposit, i was wondering if its not less work overall to export the information properly ... thx [...]
On 7/16/2021 1:55 PM, Michael Niedermayer wrote: > On Thu, Jul 15, 2021 at 05:55:51PM -0300, James Almer wrote: >> On 7/15/2021 5:23 PM, Michael Niedermayer wrote: >>> On Wed, Jul 14, 2021 at 11:33:59AM -0300, James Almer wrote: >>>> It will be used to allow parsers to be more liberal when tagging packets as >>>> keyframes. >>>> >>>> Signed-off-by: James Almer <jamrial@gmail.com> >>>> --- >>>> libavcodec/avcodec.h | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h >>>> index 8b97895aeb..8e3d888266 100644 >>>> --- a/libavcodec/avcodec.h >>>> +++ b/libavcodec/avcodec.h >>>> @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { >>>> #define PARSER_FLAG_ONCE 0x0002 >>>> /// Set if the parser has a valid file offset >>>> #define PARSER_FLAG_FETCHED_OFFSET 0x0004 >>>> +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 >>>> #define PARSER_FLAG_USE_CODEC_TS 0x1000 >>> >>> This doesnt "feel" like the best solution to me >>> >>> dont you think it would be better to export all information ? >> >> The AVParser API is going to be removed at some point for something better >> that works on packets rather than raw buffers, so ideally we should not >> expand it too much, and leave more complex implementations for later. > > Iam not sure i follow your logic here > > do you suggest to block setting lets call it "advanced keyframe" related work > across the codebase ? (that includes muxers and so on ? > Iam not saying thats a good or bad idea. Just trying to understand I'm saying to not spend time extending the AVCodecParserContext API to export even more information when it's going to be replaced. A single flag to stop one parser from tagging more than it needs to is enough to work around the problem at hand. > > Because to me it seems if the awnser is no to the question above > then this is not just about "Adding a flag is the simplest way to fix this" > but its adding codec specific code in muxers probably to extract the exta > information, and iam not sure that at this point its not easier to have > the parser export this information Inserting the h264 parser internally in mpegtsenc is needed because it's the generic demuxer code that may or may not use one to assemble packets or fill missing information. The muxer side has no interaction with it, it only receives packets from some source. So you add a new field to AVCodecParserContext to signal these "advanced keyframes". How does that information reach the muxer? You're now having to define or expand more stuff for this purpose. Inserting it in the muxer also has the added benefit of not depending on the demuxing side from having inserted one to begin with. For example, open an mp4 file for which the demuxer was able to create an AVIndexEntry array and therefore will not tell the generic code that it needs a parser. If you remux that file to mpegts, only Sync Samples (Of which there may be none) will have been tagged as keyframes by the demuxer (Which is expected) and not all these other packets with recovery points like mpegts apparently wants. If the muxer internally inserts the parser, all of them can be marked, and a flag is enough for that. > > And when the parser API is then replaced by "something better" the interface > changes less. If this heuristic flag is added then i assume it needs to be > removed and reimplemented with the API change. This flag would be removed alongside the rest of AVCodecParserContext API, but the replacement would not need something like it since we would then look into ensuring it exports more detailed information for a given packet in a proper way, and not just setting a generic and poorly defined context-wide key_frame field. This is just a stopgap solution, not something that needs to be translated into a new API. I mean, just look how all these flags are not even namespaced, or how the main function doesn't even return error codes. A replacement hardly requires to be a 1:1 translation. > > I very likely am missing something > But i certainly do not suggest or ask for more work to be done. Rather the > opposit, i was wondering if its not less work overall to export the > information properly ... The replacement for AVCodecParserContext would, if i recall correctly when it was suggested, be having the work done by bitstream filters. Some muxers and demuxers are already using bsfs in some cases, so it would be built on top of that, and maybe automatically inserted by both muxers and demuxers, not just the latter. So considering that muxer specific code is needed right now, I don't think adding more fields to AVCodecParserContext will mean less work now or even later. > > thx > > [...] > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
On Fri, Jul 16, 2021 at 03:43:39PM -0300, James Almer wrote: > On 7/16/2021 1:55 PM, Michael Niedermayer wrote: > > On Thu, Jul 15, 2021 at 05:55:51PM -0300, James Almer wrote: > > > On 7/15/2021 5:23 PM, Michael Niedermayer wrote: > > > > On Wed, Jul 14, 2021 at 11:33:59AM -0300, James Almer wrote: > > > > > It will be used to allow parsers to be more liberal when tagging packets as > > > > > keyframes. > > > > > > > > > > Signed-off-by: James Almer <jamrial@gmail.com> > > > > > --- > > > > > libavcodec/avcodec.h | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h > > > > > index 8b97895aeb..8e3d888266 100644 > > > > > --- a/libavcodec/avcodec.h > > > > > +++ b/libavcodec/avcodec.h > > > > > @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { > > > > > #define PARSER_FLAG_ONCE 0x0002 > > > > > /// Set if the parser has a valid file offset > > > > > #define PARSER_FLAG_FETCHED_OFFSET 0x0004 > > > > > +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 > > > > > #define PARSER_FLAG_USE_CODEC_TS 0x1000 > > > > > > > > This doesnt "feel" like the best solution to me > > > > > > > > dont you think it would be better to export all information ? > > > > > > The AVParser API is going to be removed at some point for something better > > > that works on packets rather than raw buffers, so ideally we should not > > > expand it too much, and leave more complex implementations for later. > > > > Iam not sure i follow your logic here > > > > do you suggest to block setting lets call it "advanced keyframe" related work > > across the codebase ? (that includes muxers and so on ? > > Iam not saying thats a good or bad idea. Just trying to understand > > I'm saying to not spend time extending the AVCodecParserContext API to > export even more information when it's going to be replaced. A single flag > to stop one parser from tagging more than it needs to is enough to work > around the problem at hand. > > > > > Because to me it seems if the awnser is no to the question above > > then this is not just about "Adding a flag is the simplest way to fix this" > > but its adding codec specific code in muxers probably to extract the exta > > information, and iam not sure that at this point its not easier to have > > the parser export this information > > Inserting the h264 parser internally in mpegtsenc is needed because it's the > generic demuxer code that may or may not use one to assemble packets or fill > missing information. The muxer side has no interaction with it, it only > receives packets from some source. > So you add a new field to AVCodecParserContext to signal these "advanced > keyframes". How does that information reach the muxer? You're now having to > define or expand more stuff for this purpose. > > Inserting it in the muxer also has the added benefit of not depending on the > demuxing side from having inserted one to begin with. For example, open an > mp4 file for which the demuxer was able to create an AVIndexEntry array and > therefore will not tell the generic code that it needs a parser. If you > remux that file to mpegts, only Sync Samples (Of which there may be none) > will have been tagged as keyframes by the demuxer (Which is expected) and > not all these other packets with recovery points like mpegts apparently > wants. > If the muxer internally inserts the parser, all of them can be marked, and a > flag is enough for that. > > > > > And when the parser API is then replaced by "something better" the interface > > changes less. If this heuristic flag is added then i assume it needs to be > > removed and reimplemented with the API change. > > This flag would be removed alongside the rest of AVCodecParserContext API, > but the replacement would not need something like it since we would then > look into ensuring it exports more detailed information for a given packet > in a proper way, > and not just setting a generic and poorly defined > context-wide key_frame field. > > This is just a stopgap solution, not something that needs to be translated > into a new API. I mean, just look how all these flags are not even > namespaced, or how the main function doesn't even return error codes. A > replacement hardly requires to be a 1:1 translation. > > > > > I very likely am missing something > > But i certainly do not suggest or ask for more work to be done. Rather the > > opposit, i was wondering if its not less work overall to export the > > information properly ... > > The replacement for AVCodecParserContext would, if i recall correctly when > it was suggested, be having the work done by bitstream filters. Some muxers > and demuxers are already using bsfs in some cases, so it would be built on > top of that, and maybe automatically inserted by both muxers and demuxers, > not just the latter. > > So considering that muxer specific code is needed right now, I don't think > adding more fields to AVCodecParserContext will mean less work now or even > later. I see your point of view. The way i saw it was 1. We at some point likely want to export "advanced keyframe" information from demuxers 2. We want this information to be presented to the user in a consistent way and that is needed no matter if Parsers or bsfs extract it when its not nativly set from demuxers 3. If we had such a data structure, setting this from the current Parser is very simple and much cleaner than this flag it also would make it available to the user applications and not just muxer internal. What i suggested is probably less work overall. But it would reorder steps and put alot of other work before this. That may be undesirable so your "hack" might be just more convenient now and iam not against it. thx [...]
James Almer: > On 7/15/2021 5:23 PM, Michael Niedermayer wrote: >> >> the concept of a keyframe is a point at which decoding can begin >> that really are at least 3 points >> >> the point at which packets begin to be input into the decoder >> >> the point at which the decoder is able to return some decoded >> data which closely resembles the encoder input >> >> and the point at which the decoder output matches 1:1 the output >> of a decoder starting from frame 0 > > All parsers save for h264 are currently only tagging packets containing > a coded bitstream that, when decoded, it fully resets the decoding state > and depends on no previously parsed data or state, which is what (most) > muxers expect. This approach here is making the h264 do the same by > default (in line with the decoder), to ensure some muxers don't wrongly > mark certain packets as sync samples, while letting others remain > liberal about it. > That is not true: The HEVC parser marks packets that may have leading RASL pictures as keyframe; such frames are not sync samples according to my version of ISO/IEC 14496-15. (Furthermore, for parsers that don't set key_frame the recommended fallback is by checking pict_type for AV_PICTURE_TYPE_I (parse_packet() in libavformat/utils.c does this); if one follows this, then MPEG-2 I-frames will be marked as keyframes, even when they are not sync samples in ISOBMFF if there is an open GOP.) It seems to be mostly followed that random access points are keyframes even if they are not IDR frames/even if there is an open GOP. In fact, the AV1 parser (which does not set it for delayed random access points (AV1's equivalent of open GOP)) seems to be an exception. And your claim (also contained in the commit message) that this brings the parser in line with the decoder is wrong, too: output_frame() in h264dec.c sets key_frame depending on sei_recovery_frame_cnt. - Andreas
On 7/17/2021 10:23 PM, Andreas Rheinhardt wrote: > James Almer: >> On 7/15/2021 5:23 PM, Michael Niedermayer wrote: >>> >>> the concept of a keyframe is a point at which decoding can begin >>> that really are at least 3 points >>> >>> the point at which packets begin to be input into the decoder >>> >>> the point at which the decoder is able to return some decoded >>> data which closely resembles the encoder input >>> >>> and the point at which the decoder output matches 1:1 the output >>> of a decoder starting from frame 0 >> >> All parsers save for h264 are currently only tagging packets containing >> a coded bitstream that, when decoded, it fully resets the decoding state >> and depends on no previously parsed data or state, which is what (most) >> muxers expect. This approach here is making the h264 do the same by >> default (in line with the decoder), to ensure some muxers don't wrongly >> mark certain packets as sync samples, while letting others remain >> liberal about it. >> > That is not true: The HEVC parser marks packets that may have leading > RASL pictures as keyframe; such frames are not sync samples according to > my version of ISO/IEC 14496-15. (Furthermore, for parsers that don't set > key_frame the recommended fallback is by checking pict_type for > AV_PICTURE_TYPE_I (parse_packet() in libavformat/utils.c does this); if > one follows this, then MPEG-2 I-frames will be marked as keyframes, even > when they are not sync samples in ISOBMFF if there is an open GOP.) > > It seems to be mostly followed that random access points are keyframes > even if they are not IDR frames/even if there is an open GOP. In fact, > the AV1 parser (which does not set it for delayed random access points > (AV1's equivalent of open GOP)) seems to be an exception. > > And your claim (also contained in the commit message) that this brings > the parser in line with the decoder is wrong, too: output_frame() in > h264dec.c sets key_frame depending on sei_recovery_frame_cnt. True, missed that. But for example in the case of intra_refresh.h264 it would not trigger, whereas the parser does tag it. > > - Andreas > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". >
James Almer: > On 7/17/2021 10:23 PM, Andreas Rheinhardt wrote: >> James Almer: >>> On 7/15/2021 5:23 PM, Michael Niedermayer wrote: >>>> >>>> the concept of a keyframe is a point at which decoding can begin >>>> that really are at least 3 points >>>> >>>> the point at which packets begin to be input into the decoder >>>> >>>> the point at which the decoder is able to return some decoded >>>> data which closely resembles the encoder input >>>> >>>> and the point at which the decoder output matches 1:1 the output >>>> of a decoder starting from frame 0 >>> >>> All parsers save for h264 are currently only tagging packets containing >>> a coded bitstream that, when decoded, it fully resets the decoding state >>> and depends on no previously parsed data or state, which is what (most) >>> muxers expect. This approach here is making the h264 do the same by >>> default (in line with the decoder), to ensure some muxers don't wrongly >>> mark certain packets as sync samples, while letting others remain >>> liberal about it. >>> >> That is not true: The HEVC parser marks packets that may have leading >> RASL pictures as keyframe; such frames are not sync samples according to >> my version of ISO/IEC 14496-15. (Furthermore, for parsers that don't set >> key_frame the recommended fallback is by checking pict_type for >> AV_PICTURE_TYPE_I (parse_packet() in libavformat/utils.c does this); if >> one follows this, then MPEG-2 I-frames will be marked as keyframes, even >> when they are not sync samples in ISOBMFF if there is an open GOP.) >> >> It seems to be mostly followed that random access points are keyframes >> even if they are not IDR frames/even if there is an open GOP. In fact, >> the AV1 parser (which does not set it for delayed random access points >> (AV1's equivalent of open GOP)) seems to be an exception. >> >> And your claim (also contained in the commit message) that this brings >> the parser in line with the decoder is wrong, too: output_frame() in >> h264dec.c sets key_frame depending on sei_recovery_frame_cnt. > > True, missed that. But for example in the case of intra_refresh.h264 it > would not trigger, whereas the parser does tag it. > Yeah, we don't handle intra refresh well at all; and I am pretty sure that quite a lot of people would not consider the packets containing the recovery point SEI message a keyframe if recovery_frame_cnt is > 0. As has been said in the earlier thread, there is no single correct definition for keyframe. - Andreas
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index 8b97895aeb..8e3d888266 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -2809,6 +2809,7 @@ typedef struct AVCodecParserContext { #define PARSER_FLAG_ONCE 0x0002 /// Set if the parser has a valid file offset #define PARSER_FLAG_FETCHED_OFFSET 0x0004 +#define PARSER_FLAG_USE_KEYFRAME_HEURISTICS 0x0008 #define PARSER_FLAG_USE_CODEC_TS 0x1000 int64_t offset; ///< byte offset from starting packet start
It will be used to allow parsers to be more liberal when tagging packets as keyframes. Signed-off-by: James Almer <jamrial@gmail.com> --- libavcodec/avcodec.h | 1 + 1 file changed, 1 insertion(+)