diff mbox series

[FFmpeg-devel,2/3] oggdec: verify page checksum

Message ID M6-drYu--3-2@lynne.ee
State Accepted
Headers show
Series [FFmpeg-devel,1/3] oggdec: eliminate copies and extra buffers | expand

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Lynne April 28, 2020, 11:58 a.m. UTC
This makes decoding far more robust, since OggS, the ogg magic, 
can be commonly found randomly in streams, which previously made
the demuxer think there's a new stream or a change in such.

Patch attached.

---
 libavformat/oggdec.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Mattias Wadman April 28, 2020, 12:59 p.m. UTC | #1
On Tue, Apr 28, 2020 at 1:59 PM Lynne <dev@lynne.ee> wrote:
>
> This makes decoding far more robust, since OggS, the ogg magic,
> can be commonly found randomly in streams, which previously made
> the demuxer think there's a new stream or a change in such.
>
> Patch attached.

Should check version after verifying checksum? this breaks with my
test files were the false syncword has a non-zero byte afterwards.

>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Mattias Wadman April 28, 2020, 1:05 p.m. UTC | #2
On Tue, Apr 28, 2020 at 1:59 PM Lynne <dev@lynne.ee> wrote:
>
> This makes decoding far more robust, since OggS, the ogg magic,
> can be commonly found randomly in streams, which previously made
> the demuxer think there's a new stream or a change in such.
>
> Patch attached.

Maybe nitpick, could seek back even longer than size on crc mismatch?

>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Lynne April 28, 2020, 2:11 p.m. UTC | #3
Apr 28, 2020, 14:05 by mattias.wadman@gmail.com:

> On Tue, Apr 28, 2020 at 1:59 PM Lynne <dev@lynne.ee> wrote:
>
>>
>> This makes decoding far more robust, since OggS, the ogg magic,
>> can be commonly found randomly in streams, which previously made
>> the demuxer think there's a new stream or a change in such.
>>
>> Patch attached.
>>
>
> Maybe nitpick, could seek back even longer than size on crc mismatch?
>

Thanks a lot for reviewing the patches.
I was thinking about seeking back further to perhaps the version byte, but unfortunately, that's
not possible. If ffio_ensure_seekback is called multiple times, the last call overwrites the previous
calls and we lose the ability to seek before it.
Since the only place at which we know the exact size of the page is when its signaled, that's the
only point we can ensure we can seek back to.
Before that, we can seek back from the header's end to the version byte. Unfortunately, that
would mean verifying the header before the checksum, which as you've pointed out, is bad
for robustness.
Still, this is definitely an improvement, since previously the demuxer didn't seek back at all.

Anyway, I've implemented checking the version byte after reading the CRC as you suggested
and have attached the new patch.
Mattias Wadman April 28, 2020, 2:31 p.m. UTC | #4
On Tue, Apr 28, 2020 at 4:12 PM Lynne <dev@lynne.ee> wrote:
>
> Apr 28, 2020, 14:05 by mattias.wadman@gmail.com:
>
> > On Tue, Apr 28, 2020 at 1:59 PM Lynne <dev@lynne.ee> wrote:
> >
> >>
> >> This makes decoding far more robust, since OggS, the ogg magic,
> >> can be commonly found randomly in streams, which previously made
> >> the demuxer think there's a new stream or a change in such.
> >>
> >> Patch attached.
> >>
> >
> > Maybe nitpick, could seek back even longer than size on crc mismatch?
> >
>
> Thanks a lot for reviewing the patches.
> I was thinking about seeking back further to perhaps the version byte, but unfortunately, that's
> not possible. If ffio_ensure_seekback is called multiple times, the last call overwrites the previous
> calls and we lose the ability to seek before it.
> Since the only place at which we know the exact size of the page is when its signaled, that's the
> only point we can ensure we can seek back to.
> Before that, we can seek back from the header's end to the version byte. Unfortunately, that
> would mean verifying the header before the checksum, which as you've pointed out, is bad
> for robustness.
> Still, this is definitely an improvement, since previously the demuxer didn't seek back at all.

I see, thanks for the explaining. Also better than in my patch where
it would skip the whole page.

> Anyway, I've implemented checking the version byte after reading the CRC as you suggested
> and have attached the new patch.

Nice, test files works fine now

Would it make sense to conditionally ignore crc mismatch based on
s->error_recognition & AV_EF_CRCCHECK ?

>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Lynne April 28, 2020, 2:45 p.m. UTC | #5
Apr 28, 2020, 15:31 by mattias.wadman@gmail.com:

> Nice, test files works fine now
>
> Would it make sense to conditionally ignore crc mismatch based on
> s->error_recognition & AV_EF_CRCCHECK ?
>

I don't think so. This container specifically relies on CRC matching to identify packets
during a seek. While other containers have more advanced sync mechanisms beyond a simple
syncword and checksum, that's all we have here.
What's worse, we need to be able to handle concatenated ogg files (chained opus, vorbis, etc.),
which are widely used on internet radios. Those have the extradata needed to configure the decoder
on the first packet. If we skip the CRC and misidentify a packet as a header, we'll misconfigure the
decoder and break decoding until the next actual header arrives, which could be many minutes.
The whole chained ogg mechanism is already fragile enough as it unfortunately is.
Mattias Wadman April 28, 2020, 2:59 p.m. UTC | #6
On Tue, Apr 28, 2020 at 4:45 PM Lynne <dev@lynne.ee> wrote:
>
> Apr 28, 2020, 15:31 by mattias.wadman@gmail.com:
>
> > Nice, test files works fine now
> >
> > Would it make sense to conditionally ignore crc mismatch based on
> > s->error_recognition & AV_EF_CRCCHECK ?
> >
>
> I don't think so. This container specifically relies on CRC matching to identify packets
> during a seek. While other containers have more advanced sync mechanisms beyond a simple
> syncword and checksum, that's all we have here.
> What's worse, we need to be able to handle concatenated ogg files (chained opus, vorbis, etc.),
> which are widely used on internet radios. Those have the extradata needed to configure the decoder
> on the first packet. If we skip the CRC and misidentify a packet as a header, we'll misconfigure the
> decoder and break decoding until the next actual header arrives, which could be many minutes.
> The whole chained ogg mechanism is already fragile enough as it unfortunately is.

Sorry yes that make sense. I meant more that AV_EF_CRCCHECK seems to
be set by default so adding a
conditionally check would be if someone for some reason really want to
skip it using -err_detect 0 or so.

> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Lynne April 28, 2020, 3:07 p.m. UTC | #7
Apr 28, 2020, 15:59 by mattias.wadman@gmail.com:

> On Tue, Apr 28, 2020 at 4:45 PM Lynne <dev@lynne.ee> wrote:
>
>>
>> Apr 28, 2020, 15:31 by mattias.wadman@gmail.com:
>>
>> > Nice, test files works fine now
>> >
>> > Would it make sense to conditionally ignore crc mismatch based on
>> > s->error_recognition & AV_EF_CRCCHECK ?
>> >
>>
>> I don't think so. This container specifically relies on CRC matching to identify packets
>> during a seek. While other containers have more advanced sync mechanisms beyond a simple
>> syncword and checksum, that's all we have here.
>> What's worse, we need to be able to handle concatenated ogg files (chained opus, vorbis, etc.),
>> which are widely used on internet radios. Those have the extradata needed to configure the decoder
>> on the first packet. If we skip the CRC and misidentify a packet as a header, we'll misconfigure the
>> decoder and break decoding until the next actual header arrives, which could be many minutes.
>> The whole chained ogg mechanism is already fragile enough as it unfortunately is.
>>
>
> Sorry yes that make sense. I meant more that AV_EF_CRCCHECK seems to
> be set by default so adding a
> conditionally check would be if someone for some reason really want to
> skip it using -err_detect 0 or so.
>

Well, I consider CRC checking a part of correctly parsing ogg, so I think its best to leave it on
all the time.
Lynne April 28, 2020, 5:22 p.m. UTC | #8
Apr 28, 2020, 16:07 by dev@lynne.ee:

> Apr 28, 2020, 15:59 by mattias.wadman@gmail.com:
>
>>
>>
> Well, I consider CRC checking a part of correctly parsing ogg, so I think its best to leave it on
> all the time.
>

Did a new version of the patches.
The first two are identical, save for some minor style tweaks.
The last one now ensures the max page size, 65k, is available for seeking and speeds up
decoding on new/replacement streams since the offset was incorrect.
I think its worth it.

I'm also close to figuring out how to make chained Opus work, will send a patch once I do.
Michael Niedermayer April 29, 2020, 3 p.m. UTC | #9
On Tue, Apr 28, 2020 at 07:22:08PM +0200, Lynne wrote:
> Apr 28, 2020, 16:07 by dev@lynne.ee:
> 
> > Apr 28, 2020, 15:59 by mattias.wadman@gmail.com:
> >
> >>
> >>
> > Well, I consider CRC checking a part of correctly parsing ogg, so I think its best to leave it on
> > all the time.
> >
> 
> Did a new version of the patches.
> The first two are identical, save for some minor style tweaks.
> The last one now ensures the max page size, 65k, is available for seeking and speeds up
> decoding on new/replacement streams since the offset was incorrect.
> I think its worth it.
> 
> I'm also close to figuring out how to make chained Opus work, will send a patch once I do.
> 

This patchset seems to detect a crc error in:
Your patchset is probably correct in this as there are artifacts before, 
just reporting this as i noticed it.

./ffmpeg -i ~/tickets/2121/vlc-bug.oga c.wav

sample should be here: https://trac.ffmpeg.org/raw-attachment/ticket/2121/vlc-bug.oga

thanks

[...]
Lynne April 30, 2020, 9:52 a.m. UTC | #10
Apr 29, 2020, 16:00 by michael@niedermayer.cc:

> On Tue, Apr 28, 2020 at 07:22:08PM +0200, Lynne wrote:
>
>> Apr 28, 2020, 16:07 by dev@lynne.ee:
>>
>> > Apr 28, 2020, 15:59 by mattias.wadman@gmail.com:
>> >
>> >>
>> >>
>> > Well, I consider CRC checking a part of correctly parsing ogg, so I think its best to leave it on
>> > all the time.
>> >
>>
>> Did a new version of the patches.
>> The first two are identical, save for some minor style tweaks.
>> The last one now ensures the max page size, 65k, is available for seeking and speeds up
>> decoding on new/replacement streams since the offset was incorrect.
>> I think its worth it.
>>
>> I'm also close to figuring out how to make chained Opus work, will send a patch once I do.
>>
>
> This patchset seems to detect a crc error in:
> Your patchset is probably correct in this as there are artifacts before, 
> just reporting this as i noticed it.
>
> ./ffmpeg -i ~/tickets/2121/vlc-bug.oga c.wav
>
> sample should be here: https://trac.ffmpeg.org/raw-attachment/ticket/2121/vlc-bug.oga
>
> thanks
>

Thanks, checked the sample, it seems corrupt so patch is working as intended.
I plan to push the patchset tonight, unless someone has comments or reviews.
Lynne April 30, 2020, 10:11 p.m. UTC | #11
Apr 30, 2020, 10:52 by dev@lynne.ee:

> Apr 29, 2020, 16:00 by michael@niedermayer.cc:
>
>> On Tue, Apr 28, 2020 at 07:22:08PM +0200, Lynne wrote:
>>
>>> Apr 28, 2020, 16:07 by dev@lynne.ee:
>>>
>>> > Apr 28, 2020, 15:59 by mattias.wadman@gmail.com:
>>> >
>>> >>
>>> >>
>>> > Well, I consider CRC checking a part of correctly parsing ogg, so I think its best to leave it on
>>> > all the time.
>>> >
>>>
>>> Did a new version of the patches.
>>> The first two are identical, save for some minor style tweaks.
>>> The last one now ensures the max page size, 65k, is available for seeking and speeds up
>>> decoding on new/replacement streams since the offset was incorrect.
>>> I think its worth it.
>>>
>>> I'm also close to figuring out how to make chained Opus work, will send a patch once I do.
>>>
>>
>> This patchset seems to detect a crc error in:
>> Your patchset is probably correct in this as there are artifacts before, 
>> just reporting this as i noticed it.
>>
>> ./ffmpeg -i ~/tickets/2121/vlc-bug.oga c.wav
>>
>> sample should be here: https://trac.ffmpeg.org/raw-attachment/ticket/2121/vlc-bug.oga
>>
>> thanks
>>
>
> Thanks, checked the sample, it seems corrupt so patch is working as intended.
> I plan to push the patchset tonight, unless someone has comments or reviews.
>

Pushed.
diff mbox series

Patch

diff --git a/libavformat/oggdec.c b/libavformat/oggdec.c
index f67cf42e82..05cea2528b 100644
--- a/libavformat/oggdec.c
+++ b/libavformat/oggdec.c
@@ -31,6 +31,7 @@ 
 #include <stdio.h>
 #include "libavutil/avassert.h"
 #include "libavutil/intreadwrite.h"
+#include "avio_internal.h"
 #include "oggdec.h"
 #include "avformat.h"
 #include "internal.h"
@@ -321,6 +322,7 @@  static int ogg_read_page(AVFormatContext *s, int *sid)
     int flags, nsegs;
     uint64_t gp;
     uint32_t serial;
+    uint32_t crc, crc_tmp;
     int size = 0, idx;
     int64_t page_pos;
     uint8_t sync[4];
@@ -359,6 +361,9 @@  static int ogg_read_page(AVFormatContext *s, int *sid)
         return AVERROR_INVALIDDATA;
     }
 
+    /* 0x4fa9b05f = av_crc(AV_CRC_32_IEEE, 0x0, "OggS", 4) */
+    ffio_init_checksum(bc, ff_crc04C11DB7_update, 0x4fa9b05f);
+
     if (avio_r8(bc) != 0) {      /* version */
         av_log (s, AV_LOG_ERROR, "ogg page, unsupported version\n");
         return AVERROR_INVALIDDATA;
@@ -367,7 +372,12 @@  static int ogg_read_page(AVFormatContext *s, int *sid)
     flags  = avio_r8(bc);
     gp     = avio_rl64(bc);
     serial = avio_rl32(bc);
-    avio_skip(bc, 8); /* seq, crc */
+    avio_skip(bc, 4); /* seq */
+
+    crc_tmp = ffio_get_checksum(bc);
+    crc     = avio_rb32(bc);
+    crc_tmp = ff_crc04C11DB7_update(crc_tmp, (uint8_t[4]){0}, 4);
+    ffio_init_checksum(bc, ff_crc04C11DB7_update, crc_tmp);
 
     nsegs = avio_r8(bc);
     page_pos = avio_tell(bc) - 27;
@@ -407,6 +417,15 @@  static int ogg_read_page(AVFormatContext *s, int *sid)
         return ret < 0 ? ret : AVERROR_EOF;
     }
 
+    if (crc ^ ffio_get_checksum(bc)) {
+        av_log(s, AV_LOG_ERROR, "CRC mismatch!\n");
+        if (idx < 0)
+            av_free(readout_buf);
+        avio_seek(bc, -size, SEEK_CUR);
+        return 0;
+    }
+
+    /* CRC is correct so we can be 99% sure there's an actual change here */
     if (idx < 0) {
         if (data_packets_seen(ogg))
             idx = ogg_replace_stream(s, serial, size);
-- 
2.26.2