diff mbox

[FFmpeg-devel,mpegaudio_parser] Skip trailing junk data when flushing parser.

Message ID CAPUDrwcB2-2dcCoav3DNUse7BBSjNoysULqTihdyx7-_ZPN3Og@mail.gmail.com
State New
Headers show

Commit Message

Dale Curtis Feb. 23, 2018, 7:15 p.m. UTC
Whoops, that version spams on every flush; changed to only print when
buf_size > 0.

- dale

On Fri, Feb 23, 2018 at 10:37 AM, Dale Curtis <dalecurtis@chromium.org>
wrote:

> After some deeper testing it looks like this mechanism can actually fully
> replace the existing ID3 and APE tag skips; so I've simplified the patch to
> do so.
>
> - dale
>
>
> On Thu, Feb 22, 2018 at 4:46 PM, Dale Curtis <dalecurtis@chromium.org>
> wrote:
>
>> The parser should only return valid mpeg audio packets; it generally
>> does so, but in the case of flush, it returns whatever happens to
>> be in the buffer instead of ensuring its first a valid mpeg packet.
>>
>> The fix is to check whether a valid frame size has been parsed and
>> if not discard the packet when flushing.
>>
>> This should fix all sorts of mp3 files with trailing garbage.
>>
>> Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
>>
>>
>

Comments

Michael Niedermayer Feb. 24, 2018, 3:01 a.m. UTC | #1
On Fri, Feb 23, 2018 at 11:15:09AM -0800, Dale Curtis wrote:
> Whoops, that version spams on every flush; changed to only print when
> buf_size > 0.
> 
> - dale
> 
> On Fri, Feb 23, 2018 at 10:37 AM, Dale Curtis <dalecurtis@chromium.org>
> wrote:
> 
> > After some deeper testing it looks like this mechanism can actually fully
> > replace the existing ID3 and APE tag skips; so I've simplified the patch to
> > do so.
> >
> > - dale
> >
> >
> > On Thu, Feb 22, 2018 at 4:46 PM, Dale Curtis <dalecurtis@chromium.org>
> > wrote:
> >
> >> The parser should only return valid mpeg audio packets; it generally
> >> does so, but in the case of flush, it returns whatever happens to
> >> be in the buffer instead of ensuring its first a valid mpeg packet.
> >>
> >> The fix is to check whether a valid frame size has been parsed and
> >> if not discard the packet when flushing.
> >>
> >> This should fix all sorts of mp3 files with trailing garbage.
> >>
> >> Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
> >>
> >>
> >

>  mpegaudio_parser.c |   11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 91db6134e5c81519dd88289c8956851015183515  fix_trailing_garbage_mp3_v3.patch
> From b81ec4a9a3907e21cc4c4abcf5502778be94076a Mon Sep 17 00:00:00 2001
> From: Dale Curtis <dalecurtis@chromium.org>
> Date: Thu, 22 Feb 2018 16:43:37 -0800
> Subject: [PATCH] [mpegaudio_parser] Skip trailing junk data when flushing
>  parser.
> 
> The parser should only return valid mpeg audio packets; it generally
> does so, but in the case of flush, it returns whatever happens to
> be in the buffer instead of ensuring its first a valid mpeg packet.

this goes the wrong direction.
Parsers should not discard data by default. The code we have for tags is a hack.
There are many better ways to handle this.
Something similar to a AV_PKT_FLAG_CORRUPT set be the parser would be an
example. This could then optionally be discarded


[...]
Dale Curtis Feb. 26, 2018, 6:30 p.m. UTC | #2
On Fri, Feb 23, 2018 at 7:01 PM, Michael Niedermayer <michael@niedermayer.cc
> wrote:
>
> this goes the wrong direction.
> Parsers should not discard data by default. The code we have for tags is a
> hack.
> There are many better ways to handle this.
> Something similar to a AV_PKT_FLAG_CORRUPT set be the parser would be an
> example. This could then optionally be discarded
>

It's not just the trailing tags though, skipping bad data is fundamental to
how this parser works. Even if we change it so that we mark the last packet
corrupt, midstream the parser is still designed to skip over everything
that doesn't look like an mp3 frame. I.e., the data between mp3 frames is
just dropped wholesale.

Are you proposing we rework the mp3 parser entirely to not drop any data
anymore or just not drop the trailing data?

- dale
wm4 Feb. 26, 2018, 7:58 p.m. UTC | #3
On Mon, 26 Feb 2018 10:30:26 -0800
Dale Curtis <dalecurtis@chromium.org> wrote:

> On Fri, Feb 23, 2018 at 7:01 PM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> >
> > this goes the wrong direction.
> > Parsers should not discard data by default. The code we have for tags is a
> > hack.
> > There are many better ways to handle this.
> > Something similar to a AV_PKT_FLAG_CORRUPT set be the parser would be an
> > example. This could then optionally be discarded
> >  
> 
> It's not just the trailing tags though, skipping bad data is fundamental to
> how this parser works. Even if we change it so that we mark the last packet
> corrupt, midstream the parser is still designed to skip over everything
> that doesn't look like an mp3 frame. I.e., the data between mp3 frames is
> just dropped wholesale.
> 
> Are you proposing we rework the mp3 parser entirely to not drop any data
> anymore or just not drop the trailing data?

It sounds like the idea is that 1. a parser should never drop data,
even if it's useless data, and 2. the non-mp3 data would go into its
own packet (so that concatenating the packet contents would result in
the original byte stream). This packet would be marked as CORRUPT, so
the user can easily skip it.

I think that's reasonable, but has other problems, such as API
compatibility. Also, sometimes you might want to decode partial packets
(think of silly cases like partially downloaded PNG files). These would
be are as CORRUPT too, but don't contain complete junk data. We would
have to set AVFMT_FLAG_DISCARD_CORRUPT by default to keep the old mp3
behavior (for actual tags), but this would change behavior in other
cases.

It'd also feel strange if we'd use the CORRUPT flag for trailing junk,
but not leading junk (like id3v2 tags).

So I'd argue just dropping junk is OK, unless we introduce something
like AV_PKT_FLAG_JUNK (and corresponding AVFMT_FLAG_DISCARD_JUKN). The
discard flag could be enabled by default, but then I'd not be sure
about the situation where this should not be set. Depends what exactly
michaelni is thinking of here, I guess.

(Hopefully my post is helpful instead of piling on the bikeshed.)
Michael Niedermayer Feb. 26, 2018, 11:07 p.m. UTC | #4
On Mon, Feb 26, 2018 at 10:30:26AM -0800, Dale Curtis wrote:
> On Fri, Feb 23, 2018 at 7:01 PM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> >
> > this goes the wrong direction.
> > Parsers should not discard data by default. The code we have for tags is a
> > hack.
> > There are many better ways to handle this.
> > Something similar to a AV_PKT_FLAG_CORRUPT set be the parser would be an
> > example. This could then optionally be discarded
> >
> 
> It's not just the trailing tags though, skipping bad data is fundamental to
> how this parser works. Even if we change it so that we mark the last packet
> corrupt, midstream the parser is still designed to skip over everything
> that doesn't look like an mp3 frame. I.e., the data between mp3 frames is
> just dropped wholesale.
> 
> Are you proposing we rework the mp3 parser entirely to not drop any data
> anymore or just not drop the trailing data?

The way parsers are intended to work (that is years ago and i dont remember 
that there was a proposal to change this) 
is to never drop data. Thats how the stuff was intended to work.

On top of that it was sometimes convenient to just drop data in a Parser, or
a Parser was sloppy implemented and unintentionally looses data.

To awnser the above question, i dont think the mp3 parser can just be changed
alone. As other code like for example muxers depend on some tags being 
discarded.
So any change has to be done with some care.

not specific to just mp3 but all parsers,
droping data the parser does not understand is bad though.
it can be an extension that some decoders could use
it can be damaged data that a decoder can partly recover and improve the output
it can be some data that is invalid in that location (like midstream metadata 
changes in some formats) but that still contains valuable data and that would
be lost even if the components downstream in a player are able to interpret it.

Thus for me droping data in a parser is something that i prefer to avoid.
Unless "not droping" would cause noticably more problems ...

Thanks

[...]
Dale Curtis March 1, 2018, 12:17 a.m. UTC | #5
On Mon, Feb 26, 2018 at 3:07 PM, Michael Niedermayer <michael@niedermayer.cc
> wrote:
>
> The way parsers are intended to work (that is years ago and i dont remember
> that there was a proposal to change this)
> is to never drop data. Thats how the stuff was intended to work.
>
> On top of that it was sometimes convenient to just drop data in a Parser,
> or
> a Parser was sloppy implemented and unintentionally looses data.
>
> To awnser the above question, i dont think the mp3 parser can just be
> changed
> alone. As other code like for example muxers depend on some tags being
> discarded.
> So any change has to be done with some care.
>
> not specific to just mp3 but all parsers,
> droping data the parser does not understand is bad though.
> it can be an extension that some decoders could use
> it can be damaged data that a decoder can partly recover and improve the
> output
> it can be some data that is invalid in that location (like midstream
> metadata
> changes in some formats) but that still contains valuable data and that
> would
> be lost even if the components downstream in a player are able to
> interpret it.
>
> Thus for me droping data in a parser is something that i prefer to avoid.
> Unless "not droping" would cause noticably more problems ...
>
>
Thanks for the information. I agree that seems reasonable, but is a
significant API change to the current system that, as you note, extends
beyond just the mp3 parser. It's certainly outside of the scope of things I
can sign up for at this time. While I feel my patch is a strict improvement
(more consistent) that fits with the current design, we can maintain this
in Chromium if you'd rather not accept it.

- dale
diff mbox

Patch

From b81ec4a9a3907e21cc4c4abcf5502778be94076a Mon Sep 17 00:00:00 2001
From: Dale Curtis <dalecurtis@chromium.org>
Date: Thu, 22 Feb 2018 16:43:37 -0800
Subject: [PATCH] [mpegaudio_parser] Skip trailing junk data when flushing
 parser.

The parser should only return valid mpeg audio packets; it generally
does so, but in the case of flush, it returns whatever happens to
be in the buffer instead of ensuring its first a valid mpeg packet.

The fix is to check whether a valid frame size has been parsed and
if not discard the packet when flushing.

This should fix all sorts of mp3 files with trailing garbage.

Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
---
 libavcodec/mpegaudio_parser.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/libavcodec/mpegaudio_parser.c b/libavcodec/mpegaudio_parser.c
index 244281b56f..87365a2c75 100644
--- a/libavcodec/mpegaudio_parser.c
+++ b/libavcodec/mpegaudio_parser.c
@@ -23,8 +23,6 @@ 
 #include "parser.h"
 #include "mpegaudiodecheader.h"
 #include "libavutil/common.h"
-#include "libavformat/apetag.h" // for APE tag.
-#include "libavformat/id3v1.h" // for ID3v1_TAG_SIZE
 
 typedef struct MpegAudioParseContext {
     ParseContext pc;
@@ -115,13 +113,8 @@  static int mpegaudio_parse(AVCodecParserContext *s1,
         return buf_size;
     }
 
-    if (flush && buf_size >= ID3v1_TAG_SIZE && memcmp(buf, "TAG", 3) == 0) {
-        *poutbuf = NULL;
-        *poutbuf_size = 0;
-        return next;
-    }
-
-    if (flush && buf_size >= APE_TAG_FOOTER_BYTES && memcmp(buf, APE_TAG_PREAMBLE, 8) == 0) {
+    if (flush && buf_size && !s->frame_size) {
+        av_log(avctx, AV_LOG_WARNING, "Discarding invalid trailing data from mpeg audio stream.\n");
         *poutbuf = NULL;
         *poutbuf_size = 0;
         return next;
-- 
2.16.1.291.g4437f3f132-goog