diff mbox series

[FFmpeg-devel,11/15] avformat/flacdec:Return correct error-codes on read-failure

Message ID b2163d719d371a7acf6fa8df4572e6e83bd72cb6.camel@haerdin.se
State New
Headers show
Series Spotify patchset | expand

Checks

Context Check Description
andriy/configure_x86 warning Failed to apply patch
yinshiyou/configure_loongarch64 warning Failed to apply patch

Commit Message

Tomas Härdin Oct. 29, 2024, 2:50 p.m. UTC
Reasonable enough. Might want a sample

Spotify comments
----------------
Unexpected EOF is treated as invalid data

/Tomas

Comments

Michael Niedermayer Oct. 29, 2024, 5:42 p.m. UTC | #1
On Tue, Oct 29, 2024 at 03:50:05PM +0100, Tomas Härdin wrote:
> Reasonable enough. Might want a sample
> 
> Spotify comments
> ----------------
> Unexpected EOF is treated as invalid data
> 
> /Tomas

>  flacdec.c |   20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 8cf62c7b8b7e576cc0e2a0a1e49c4f42be5fc254  0011-avformat-flacdec-Return-correct-error-codes-on-read-.patch
> From 4d803c5f5c13e194a0e52d287f021b73ec7172bd Mon Sep 17 00:00:00 2001
> From: Ulrik <ulrikm@spotify.com>
> Date: Thu, 26 Jan 2023 17:51:02 +0100
> Subject: [PATCH 11/15] avformat/flacdec:Return correct error-codes on
>  read-failure
> 
> Forward errors from `avio_read` directly. When `avio_read` sees EOF before
> expected bytes can be read, consistently return `AVERROR_INVALIDDATA`
> 
> We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to read
> metadata block headers. `AVERROR_INVALIDDATA` is already negative, so
> wrapping in `AVERROR` leads to double-negation.
> 
> We used to return `AVERROR(EIO)` when failing to read extended metadata.
> However, many times, the IO-layer is not at fault, the input data is simply
> corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as well.
> ---
>  libavformat/flacdec.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> index 9f65c25864..be305fec82 100644
> --- a/libavformat/flacdec.c
> +++ b/libavformat/flacdec.c
> @@ -81,8 +81,15 @@ static int flac_read_header(AVFormatContext *s)
>  
>      /* process metadata blocks */
>      while (!avio_feof(s->pb) && !metadata_last) {
> -        if (avio_read(s->pb, header, 4) != 4)
> -            return AVERROR_INVALIDDATA;
> +        ret = avio_read(s->pb, header, 4);

> +        if (ret != 4) {
> +            if (ret < 0) {
> +                goto fail;
> +            } else {
> +                return AVERROR_INVALIDDATA;
> +            }
> +        }

this is wierd
because, one path goto fails the other returns directly.

thx

[...]
Tomas Härdin Oct. 29, 2024, 9:06 p.m. UTC | #2
tis 2024-10-29 klockan 18:42 +0100 skrev Michael Niedermayer:
> On Tue, Oct 29, 2024 at 03:50:05PM +0100, Tomas Härdin wrote:
> > Reasonable enough. Might want a sample
> > 
> > Spotify comments
> > ----------------
> > Unexpected EOF is treated as invalid data
> > 
> > /Tomas
> 
> >  flacdec.c |   20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 8cf62c7b8b7e576cc0e2a0a1e49c4f42be5fc254  0011-avformat-flacdec-
> > Return-correct-error-codes-on-read-.patch
> > From 4d803c5f5c13e194a0e52d287f021b73ec7172bd Mon Sep 17 00:00:00
> > 2001
> > From: Ulrik <ulrikm@spotify.com>
> > Date: Thu, 26 Jan 2023 17:51:02 +0100
> > Subject: [PATCH 11/15] avformat/flacdec:Return correct error-codes
> > on
> >  read-failure
> > 
> > Forward errors from `avio_read` directly. When `avio_read` sees EOF
> > before
> > expected bytes can be read, consistently return
> > `AVERROR_INVALIDDATA`
> > 
> > We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to
> > read
> > metadata block headers. `AVERROR_INVALIDDATA` is already negative,
> > so
> > wrapping in `AVERROR` leads to double-negation.
> > 
> > We used to return `AVERROR(EIO)` when failing to read extended
> > metadata.
> > However, many times, the IO-layer is not at fault, the input data
> > is simply
> > corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as
> > well.
> > ---
> >  libavformat/flacdec.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> > index 9f65c25864..be305fec82 100644
> > --- a/libavformat/flacdec.c
> > +++ b/libavformat/flacdec.c
> > @@ -81,8 +81,15 @@ static int flac_read_header(AVFormatContext *s)
> >  
> >      /* process metadata blocks */
> >      while (!avio_feof(s->pb) && !metadata_last) {
> > -        if (avio_read(s->pb, header, 4) != 4)
> > -            return AVERROR_INVALIDDATA;
> > +        ret = avio_read(s->pb, header, 4);
> 
> > +        if (ret != 4) {
> > +            if (ret < 0) {
> > +                goto fail;
> > +            } else {
> > +                return AVERROR_INVALIDDATA;
> > +            }
> > +        }
> 
> this is wierd
> because, one path goto fails the other returns directly.

Could be a rebase mistake. I'll take a second look at it tomorrow

/Tomas
Tomas Härdin Oct. 30, 2024, 10:44 a.m. UTC | #3
tis 2024-10-29 klockan 18:42 +0100 skrev Michael Niedermayer:
> On Tue, Oct 29, 2024 at 03:50:05PM +0100, Tomas Härdin wrote:
> > Reasonable enough. Might want a sample
> > 
> > Spotify comments
> > ----------------
> > Unexpected EOF is treated as invalid data
> > 
> > /Tomas
> 
> >  flacdec.c |   20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 8cf62c7b8b7e576cc0e2a0a1e49c4f42be5fc254  0011-avformat-flacdec-
> > Return-correct-error-codes-on-read-.patch
> > From 4d803c5f5c13e194a0e52d287f021b73ec7172bd Mon Sep 17 00:00:00
> > 2001
> > From: Ulrik <ulrikm@spotify.com>
> > Date: Thu, 26 Jan 2023 17:51:02 +0100
> > Subject: [PATCH 11/15] avformat/flacdec:Return correct error-codes
> > on
> >  read-failure
> > 
> > Forward errors from `avio_read` directly. When `avio_read` sees EOF
> > before
> > expected bytes can be read, consistently return
> > `AVERROR_INVALIDDATA`
> > 
> > We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to
> > read
> > metadata block headers. `AVERROR_INVALIDDATA` is already negative,
> > so
> > wrapping in `AVERROR` leads to double-negation.
> > 
> > We used to return `AVERROR(EIO)` when failing to read extended
> > metadata.
> > However, many times, the IO-layer is not at fault, the input data
> > is simply
> > corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as
> > well.
> > ---
> >  libavformat/flacdec.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> > index 9f65c25864..be305fec82 100644
> > --- a/libavformat/flacdec.c
> > +++ b/libavformat/flacdec.c
> > @@ -81,8 +81,15 @@ static int flac_read_header(AVFormatContext *s)
> >  
> >      /* process metadata blocks */
> >      while (!avio_feof(s->pb) && !metadata_last) {
> > -        if (avio_read(s->pb, header, 4) != 4)
> > -            return AVERROR_INVALIDDATA;
> > +        ret = avio_read(s->pb, header, 4);
> 
> > +        if (ret != 4) {
> > +            if (ret < 0) {
> > +                goto fail;
> > +            } else {
> > +                return AVERROR_INVALIDDATA;
> > +            }
> > +        }
> 
> this is wierd
> because, one path goto fails the other returns directly.

Oh wait now I see what you mean. buffer isn't allocated in this hunk so
it could just as well just return. The second hunk *should* goto fail
however. Updated patch attached

I also made some test files to demonstrate the differences in behavior.
What's being addressed here is early termination of the file. One thing
to bikeshed over is whether to return AVERROR_EOF or
AVERROR_INVALIDDATA in that case. The attached files demonstrate a
change in return value depending on how short a flac file is. It might
make more sense to always return AVERROR_EOF since being truncated is
fundamentally the issue with the file in these cases

The original intent seems mostly to just be passing error codes along

/Tomas
Michael Niedermayer Nov. 3, 2024, 12:46 a.m. UTC | #4
On Wed, Oct 30, 2024 at 11:44:37AM +0100, Tomas Härdin wrote:
> tis 2024-10-29 klockan 18:42 +0100 skrev Michael Niedermayer:
> > On Tue, Oct 29, 2024 at 03:50:05PM +0100, Tomas Härdin wrote:
> > > Reasonable enough. Might want a sample
> > > 
> > > Spotify comments
> > > ----------------
> > > Unexpected EOF is treated as invalid data
> > > 
> > > /Tomas
> > 
> > >  flacdec.c |   20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > 8cf62c7b8b7e576cc0e2a0a1e49c4f42be5fc254  0011-avformat-flacdec-
> > > Return-correct-error-codes-on-read-.patch
> > > From 4d803c5f5c13e194a0e52d287f021b73ec7172bd Mon Sep 17 00:00:00
> > > 2001
> > > From: Ulrik <ulrikm@spotify.com>
> > > Date: Thu, 26 Jan 2023 17:51:02 +0100
> > > Subject: [PATCH 11/15] avformat/flacdec:Return correct error-codes
> > > on
> > >  read-failure
> > > 
> > > Forward errors from `avio_read` directly. When `avio_read` sees EOF
> > > before
> > > expected bytes can be read, consistently return
> > > `AVERROR_INVALIDDATA`
> > > 
> > > We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to
> > > read
> > > metadata block headers. `AVERROR_INVALIDDATA` is already negative,
> > > so
> > > wrapping in `AVERROR` leads to double-negation.
> > > 
> > > We used to return `AVERROR(EIO)` when failing to read extended
> > > metadata.
> > > However, many times, the IO-layer is not at fault, the input data
> > > is simply
> > > corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as
> > > well.
> > > ---
> > >  libavformat/flacdec.c | 20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> > > index 9f65c25864..be305fec82 100644
> > > --- a/libavformat/flacdec.c
> > > +++ b/libavformat/flacdec.c
> > > @@ -81,8 +81,15 @@ static int flac_read_header(AVFormatContext *s)
> > >  
> > >      /* process metadata blocks */
> > >      while (!avio_feof(s->pb) && !metadata_last) {
> > > -        if (avio_read(s->pb, header, 4) != 4)
> > > -            return AVERROR_INVALIDDATA;
> > > +        ret = avio_read(s->pb, header, 4);
> > 
> > > +        if (ret != 4) {
> > > +            if (ret < 0) {
> > > +                goto fail;
> > > +            } else {
> > > +                return AVERROR_INVALIDDATA;
> > > +            }
> > > +        }
> > 
> > this is wierd
> > because, one path goto fails the other returns directly.
> 
> Oh wait now I see what you mean. buffer isn't allocated in this hunk so
> it could just as well just return. The second hunk *should* goto fail
> however. Updated patch attached
> 
> I also made some test files to demonstrate the differences in behavior.
> What's being addressed here is early termination of the file. One thing
> to bikeshed over is whether to return AVERROR_EOF or
> AVERROR_INVALIDDATA in that case. The attached files demonstrate a
> change in return value depending on how short a flac file is. It might
> make more sense to always return AVERROR_EOF since being truncated is
> fundamentally the issue with the file in these cases
> 
> The original intent seems mostly to just be passing error codes along
> 
> /Tomas

>  flacdec.c |   14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 8d27d2b37842444aad8f5df06ba02b7511ac8425  0011-avformat-flacdec-Return-correct-error-codes-on-read-.patch
> From 5f19fddd70417284f36421e67b92af673b7c6965 Mon Sep 17 00:00:00 2001
> From: Ulrik <ulrikm@spotify.com>
> Date: Thu, 26 Jan 2023 17:51:02 +0100
> Subject: [PATCH 11/17] avformat/flacdec:Return correct error-codes on
>  read-failure
> 
> Forward errors from `avio_read` directly. When `avio_read` sees EOF before
> expected bytes can be read, consistently return `AVERROR_INVALIDDATA`
> 
> We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to read
> metadata block headers. `AVERROR_INVALIDDATA` is already negative, so
> wrapping in `AVERROR` leads to double-negation.
> 
> We used to return `AVERROR(EIO)` when failing to read extended metadata.
> However, many times, the IO-layer is not at fault, the input data is simply
> corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as well.
> ---
>  libavformat/flacdec.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> index 9f65c25864..da7fbc4dea 100644
> --- a/libavformat/flacdec.c
> +++ b/libavformat/flacdec.c
> @@ -81,8 +81,13 @@ static int flac_read_header(AVFormatContext *s)
>  
>      /* process metadata blocks */
>      while (!avio_feof(s->pb) && !metadata_last) {
> -        if (avio_read(s->pb, header, 4) != 4)
> +        ret = avio_read(s->pb, header, 4);
> +        if (ret < 0) {
> +            return ret;
> +        } else if (ret != 4) {
>              return AVERROR_INVALIDDATA;
> +        }
> +
>          flac_parse_block_header(header, &metadata_last, &metadata_type,
>                                     &metadata_size);
>          switch (metadata_type) {
> @@ -96,8 +101,11 @@ static int flac_read_header(AVFormatContext *s)
>              if (!buffer) {
>                  return AVERROR(ENOMEM);
>              }
> -            if (avio_read(s->pb, buffer, metadata_size) != metadata_size) {
> -                RETURN_ERROR(AVERROR(EIO));
> +            ret = avio_read(s->pb, buffer, metadata_size);
> +            if (ret < 0) {
> +                RETURN_ERROR(ret);
> +            } else if (ret != metadata_size) {
> +                RETURN_ERROR(AVERROR_INVALIDDATA);
>              }
>              break;
>          /* skip metadata block for unsupported types */

should be ok

thx

[...]
Tomas Härdin Nov. 8, 2024, 11:05 a.m. UTC | #5
ons 2024-10-30 klockan 11:44 +0100 skrev Tomas Härdin:
> I also made some test files to demonstrate the differences in behavior.
> What's being addressed here is early termination of the file. One thing
> to bikeshed over is whether to return AVERROR_EOF or
> AVERROR_INVALIDDATA in that case. The attached files demonstrate a
> change in return value depending on how short a flac file is. It might
> make more sense to always return AVERROR_EOF since being truncated is
> fundamentally the issue with the file in these cases

Actually I feel like bikeshedding/philosophizing over this a bit since
it is probably relevant to other parts of the code. I feel there is an
important qualitative difference between a file being too short and a
file containing bad data, and we'd like to within the best of our
abilities tell the user what the problem is. But this isn't always so
easy.

Consider a length field followed by not enough data. This could happen
for two reasons: either the file was terminated early or the length
field was written incorrectly or somehow corrupted. Should we assume
that the file was written correctly and cut short, or the opposite?

In this specific case I think the answer is easy: always return EOF.
Why? Because the streaminfo chunk is 34 bytes by definition, and unless
the muxer is incredibly broken, an incomplete read is almost certainly
due to the file being cut early

We could in principle move the checks for metadata_size further up, but
it's probably not worthwhile here. In a proper parsing framework this
kind of stuff could be expressed in a grammar. For streaminfo it should
match \x00\x00\x00\x22 exactly

TL;DR: I'm going to change to returning EOF here, unless someone
objects

/Tomas
diff mbox series

Patch

From 4d803c5f5c13e194a0e52d287f021b73ec7172bd Mon Sep 17 00:00:00 2001
From: Ulrik <ulrikm@spotify.com>
Date: Thu, 26 Jan 2023 17:51:02 +0100
Subject: [PATCH 11/15] avformat/flacdec:Return correct error-codes on
 read-failure

Forward errors from `avio_read` directly. When `avio_read` sees EOF before
expected bytes can be read, consistently return `AVERROR_INVALIDDATA`

We used to return `AVERROR(AVERROR_INVALIDDATA)` when failing to read
metadata block headers. `AVERROR_INVALIDDATA` is already negative, so
wrapping in `AVERROR` leads to double-negation.

We used to return `AVERROR(EIO)` when failing to read extended metadata.
However, many times, the IO-layer is not at fault, the input data is simply
corrupted (truncated), so we return `AVERROR_INVALIDDATA` here as well.
---
 libavformat/flacdec.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
index 9f65c25864..be305fec82 100644
--- a/libavformat/flacdec.c
+++ b/libavformat/flacdec.c
@@ -81,8 +81,15 @@  static int flac_read_header(AVFormatContext *s)
 
     /* process metadata blocks */
     while (!avio_feof(s->pb) && !metadata_last) {
-        if (avio_read(s->pb, header, 4) != 4)
-            return AVERROR_INVALIDDATA;
+        ret = avio_read(s->pb, header, 4);
+        if (ret != 4) {
+            if (ret < 0) {
+                goto fail;
+            } else {
+                return AVERROR_INVALIDDATA;
+            }
+        }
+
         flac_parse_block_header(header, &metadata_last, &metadata_type,
                                    &metadata_size);
         switch (metadata_type) {
@@ -96,8 +103,13 @@  static int flac_read_header(AVFormatContext *s)
             if (!buffer) {
                 return AVERROR(ENOMEM);
             }
-            if (avio_read(s->pb, buffer, metadata_size) != metadata_size) {
-                RETURN_ERROR(AVERROR(EIO));
+            ret = avio_read(s->pb, buffer, metadata_size);
+            if (ret != metadata_size) {
+                if (ret < 0) {
+                    goto fail;
+                } else {
+                    RETURN_ERROR(AVERROR_INVALIDDATA);
+                }
             }
             break;
         /* skip metadata block for unsupported types */
-- 
2.39.2