diff mbox series

[FFmpeg-devel,v4,3/7] avcodec/webp_parser: parse each frame into one packet

Message ID 20230725085846.93593-4-thilo.borgmann@mail.de
State New
Headers show
Series webp: add support for animated WebP decoding | expand

Checks

Context Check Description
yinshiyou/make_loongarch64 success Make finished
yinshiyou/make_fate_loongarch64 success Make fate finished
andriy/make_x86 success Make finished
andriy/make_fate_x86 success Make fate finished

Commit Message

Thilo Borgmann July 25, 2023, 8:58 a.m. UTC
---
 libavcodec/webp_parser.c | 130 +++++++++++++++++++++++++++------------
 1 file changed, 89 insertions(+), 41 deletions(-)

Comments

Tomas Härdin July 25, 2023, 12:24 p.m. UTC | #1
> +    // Extremely simplified key frame detection:
> +    // - the first frame (containing headers) is marked as a key
> frame
> +    // - other frames are marked as non-key frames

Is there a more proper way of doing this? Looking briefly at the spec
one wonders why they didn't just use regular VP* inter frames..

/Tomas
Thilo Borgmann July 25, 2023, 2:18 p.m. UTC | #2
Am 25.07.23 um 14:24 schrieb Tomas Härdin:
>> +    // Extremely simplified key frame detection:
>> +    // - the first frame (containing headers) is marked as a key
>> frame
>> +    // - other frames are marked as non-key frames
> 
> Is there a more proper way of doing this? 

All frames (except the ANMF chunks) are INTRA, and all of them have a WEBP tag.
Whereas all ANMF chunks are in the same WEBP chunk as their reference frame.
So it should really be as simple as it is to mark all WEBP frames as key frames as the code does.
What more dedicated do you have in mind?

The logic as-is works with all samples I have, animated and not.
Seems to also align well with their example file layouts.
You have a more weird one?


> Looking briefly at the spec
> one wonders why they didn't just use regular VP* inter frames..

I assume the whole canvas idea could be more beneficial than vp8 inter - but otoh I don't know about vp8 compositing capabilities, if any..

-Thilo
James Zern July 25, 2023, 7:46 p.m. UTC | #3
On Tue, Jul 25, 2023 at 7:18 AM Thilo Borgmann <thilo.borgmann@mail.de> wrote:
>
> Am 25.07.23 um 14:24 schrieb Tomas Härdin:
> >> +    // Extremely simplified key frame detection:
> >> +    // - the first frame (containing headers) is marked as a key
> >> frame
> >> +    // - other frames are marked as non-key frames
> >
> > Is there a more proper way of doing this?
>
> All frames (except the ANMF chunks) are INTRA, and all of them have a WEBP tag.
> Whereas all ANMF chunks are in the same WEBP chunk as their reference frame.
> So it should really be as simple as it is to mark all WEBP frames as key frames as the code does.
> What more dedicated do you have in mind?
>
> The logic as-is works with all samples I have, animated and not.
> Seems to also align well with their example file layouts.
> You have a more weird one?
>
>
> > Looking briefly at the spec
> > one wonders why they didn't just use regular VP* inter frames..
>
> I assume the whole canvas idea could be more beneficial than vp8 inter - but otoh I don't know about vp8 compositing capabilities, if any..
>

This was more in alignment with gif and allowed for a simpler
reference structure. WebP also supports lossless and mixing
lossy/lossless so this method makes the behavior consistent between
VP8 and VP8L.
Tomas Härdin July 26, 2023, 9:35 p.m. UTC | #4
tis 2023-07-25 klockan 16:18 +0200 skrev Thilo Borgmann:
> Am 25.07.23 um 14:24 schrieb Tomas Härdin:
> > > +    // Extremely simplified key frame detection:
> > > +    // - the first frame (containing headers) is marked as a key
> > > frame
> > > +    // - other frames are marked as non-key frames
> > 
> > Is there a more proper way of doing this? 
> 
> All frames (except the ANMF chunks) are INTRA, and all of them have a
> WEBP tag.
> Whereas all ANMF chunks are in the same WEBP chunk as their reference
> frame.
> So it should really be as simple as it is to mark all WEBP frames as
> key frames as the code does.
> What more dedicated do you have in mind?

Nah mostly just curious. It just feels so weird when VP8 intra already
exists. Maybe I'm missing something. Browsers already support VP8 after
all.

> The logic as-is works with all samples I have, animated and not.
> Seems to also align well with their example file layouts.
> You have a more weird one?

Nope

/Tomas
James Zern July 27, 2023, 5:21 p.m. UTC | #5
On Wed, Jul 26, 2023 at 2:36 PM Tomas Härdin <git@haerdin.se> wrote:
>
> tis 2023-07-25 klockan 16:18 +0200 skrev Thilo Borgmann:
> > Am 25.07.23 um 14:24 schrieb Tomas Härdin:
> > > > +    // Extremely simplified key frame detection:
> > > > +    // - the first frame (containing headers) is marked as a key
> > > > frame
> > > > +    // - other frames are marked as non-key frames
> > >
> > > Is there a more proper way of doing this?
> >
> > All frames (except the ANMF chunks) are INTRA, and all of them have a
> > WEBP tag.
> > Whereas all ANMF chunks are in the same WEBP chunk as their reference
> > frame.
> > So it should really be as simple as it is to mark all WEBP frames as
> > key frames as the code does.
> > What more dedicated do you have in mind?
>
> Nah mostly just curious. It just feels so weird when VP8 intra already
> exists. Maybe I'm missing something. Browsers already support VP8 after
> all.
>

We wanted something lighter weight (memory, cpu) for an image format
rather than going full blown video. Lossless also factored into this.

> > The logic as-is works with all samples I have, animated and not.
> > Seems to also align well with their example file layouts.
> > You have a more weird one?
>
> Nope
diff mbox series

Patch

diff --git a/libavcodec/webp_parser.c b/libavcodec/webp_parser.c
index bd5f94dac5..da853bb1f5 100644
--- a/libavcodec/webp_parser.c
+++ b/libavcodec/webp_parser.c
@@ -25,13 +25,17 @@ 
 
 #include "libavutil/bswap.h"
 #include "libavutil/common.h"
+#include "libavutil/intreadwrite.h"
 
 #include "parser.h"
 
 typedef struct WebPParseContext {
     ParseContext pc;
+    int frame;
+    int first_frame;
     uint32_t fsize;
-    uint32_t remaining_size;
+    uint32_t remaining_file_size;
+    uint32_t remaining_tag_size;
 } WebPParseContext;
 
 static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx,
@@ -41,62 +45,106 @@  static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx,
     WebPParseContext *ctx = s->priv_data;
     uint64_t state = ctx->pc.state64;
     int next = END_NOT_FOUND;
-    int i = 0;
+    int i, len;
 
-    *poutbuf      = NULL;
-    *poutbuf_size = 0;
-
-restart:
-    if (ctx->pc.frame_start_found <= 8) {
-        for (; i < buf_size; i++) {
+    for (i = 0; i < buf_size;) {
+        if (ctx->remaining_tag_size) {
+            /* consuming tag */
+            len = FFMIN(ctx->remaining_tag_size, buf_size - i);
+            i += len;
+            ctx->remaining_tag_size -= len;
+            ctx->remaining_file_size -= len;
+        } else {
+            /* scan for the next tag or file */
             state = (state << 8) | buf[i];
-            if (ctx->pc.frame_start_found == 0) {
-                if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) {
-                    ctx->fsize = av_bswap32(state);
-                    if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) {
-                        ctx->pc.frame_start_found = 1;
-                        ctx->fsize += 8;
+            i++;
+
+            if (!ctx->remaining_file_size) {
+                /* scan for the next file */
+                if (ctx->pc.frame_start_found == 4) {
+                    ctx->pc.frame_start_found = 0;
+                    if ((uint32_t) state == MKBETAG('W', 'E', 'B', 'P')) {
+                        if (ctx->frame || i != 12) {
+                            ctx->frame = 0;
+                            next = i - 12;
+                            state = 0;
+                            ctx->pc.frame_start_found = 0;
+                            break;
+                        }
+                        ctx->remaining_file_size = ctx->fsize - 4;
+                        ctx->first_frame = 1;
+                        continue;
                     }
                 }
-            } else if (ctx->pc.frame_start_found == 8) {
-                if ((state >> 32) != MKBETAG('W', 'E', 'B', 'P')) {
+                if (ctx->pc.frame_start_found == 0) {
+                    if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) {
+                        ctx->fsize = av_bswap32(state);
+                        if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) {
+                            ctx->fsize += (ctx->fsize & 1);
+                            ctx->pc.frame_start_found = 1;
+                        }
+                    }
+                } else
+                    ctx->pc.frame_start_found++;
+            } else {
+                /* read the next tag */
+                ctx->remaining_file_size--;
+                if (ctx->remaining_file_size == 0) {
                     ctx->pc.frame_start_found = 0;
                     continue;
                 }
                 ctx->pc.frame_start_found++;
-                ctx->remaining_size = ctx->fsize + i - 15;
-                if (ctx->pc.index + i > 15) {
-                    next = i - 15;
-                    state = 0;
+                if (ctx->pc.frame_start_found < 8)
+                    continue;
+
+                switch (state >> 32) {
+                case MKBETAG('A', 'N', 'M', 'F'):
+                case MKBETAG('V', 'P', '8', ' '):
+                case MKBETAG('V', 'P', '8', 'L'):
+                    if (ctx->frame) {
+                        ctx->frame = 0;
+                        next = i - 8;
+                        state = 0;
+                        ctx->pc.frame_start_found = 0;
+                        goto flush;
+                    }
+                    ctx->frame = 1;
+                    break;
+                default:
                     break;
-                } else {
-                    ctx->pc.state64 = 0;
-                    goto restart;
                 }
-            } else if (ctx->pc.frame_start_found)
-                ctx->pc.frame_start_found++;
-        }
-        ctx->pc.state64 = state;
-    } else {
-        if (ctx->remaining_size) {
-            i = FFMIN(ctx->remaining_size, buf_size);
-            ctx->remaining_size -= i;
-            if (ctx->remaining_size)
-                goto flush;
 
-            ctx->pc.frame_start_found = 0;
-            goto restart;
+                ctx->remaining_tag_size = av_bswap32(state);
+                ctx->remaining_tag_size += ctx->remaining_tag_size & 1;
+                if (ctx->remaining_tag_size > ctx->remaining_file_size) {
+                    /* this might be truncated remains before end of file */
+                    ctx->remaining_tag_size = ctx->remaining_file_size;
+                }
+                ctx->pc.frame_start_found = 0;
+                state = 0;
+            }
         }
     }
-
 flush:
-    if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0)
+    ctx->pc.state64 = state;
+
+    if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0) {
+        *poutbuf      = NULL;
+        *poutbuf_size = 0;
         return buf_size;
+    }
 
-    if (next != END_NOT_FOUND && next < 0)
-        ctx->pc.frame_start_found = FFMAX(ctx->pc.frame_start_found - i - 1, 0);
-    else
-        ctx->pc.frame_start_found = 0;
+    // Extremely simplified key frame detection:
+    // - the first frame (containing headers) is marked as a key frame
+    // - other frames are marked as non-key frames
+    if (ctx->first_frame) {
+        ctx->first_frame = 0;
+        s->pict_type = AV_PICTURE_TYPE_I;
+        s->key_frame = 1;
+    } else {
+        s->pict_type = AV_PICTURE_TYPE_P;
+        s->key_frame = 0;
+    }
 
     *poutbuf      = buf;
     *poutbuf_size = buf_size;