From patchwork Fri Sep 11 06:36:12 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Zlomek, Josef" <josef@pex.com>
X-Patchwork-Id: 22283
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id A1FEB449560
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 11 Sep 2020 09:36:30 +0300 (EEST)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6F9B968B9ED;
	Fri, 11 Sep 2020 09:36:30 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from vps.zlomek.net (vps.zlomek.net [195.181.211.90])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 23AB568B95C
 for <ffmpeg-devel@ffmpeg.org>; Fri, 11 Sep 2020 09:36:23 +0300 (EEST)
Received: by vps.zlomek.net (Postfix, from userid 1000)
 id 73DD95FE24; Fri, 11 Sep 2020 08:36:22 +0200 (CEST)
From: Josef Zlomek <josef@pex.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 11 Sep 2020 08:36:12 +0200
Message-Id: <20200911063613.4475-1-josef@pex.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: 
 <CAJFp-Et_-oZ=7VddUG6cCGtiQ7Rf3GZUW0HPR6v5pCjVveRVow@mail.gmail.com>
References: 
 <CAJFp-Et_-oZ=7VddUG6cCGtiQ7Rf3GZUW0HPR6v5pCjVveRVow@mail.gmail.com>
Subject: [FFmpeg-devel] [PATCH v5 1/2] libavcodec/webp: add support for
	animated WebP decoding
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Josef Zlomek <josef@pex.com>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Fixes: 4907

Adds support for decoding of animated WebP.

The WebP parser now splits the input stream into packets containing one frame.

The WebP decoder adds the animation related features according to the specs:
https://developers.google.com/speed/webp/docs/riff_container#animation
The frames of the animation may be smaller than the image canvas.
Therefore, the frame is decoded to a temporary frame,
then it is blended into the canvas, the canvas is copied to the output frame,
and finally the frame is disposed from the canvas.

The output to AV_PIX_FMT_YUVA420P/AV_PIX_FMT_YUV420P is still supported.
The background color is specified only as BGRA in the WebP file
so it is converted to YUVA if YUV formats are output.

Signed-off-by: Josef Zlomek <josef@pex.com>
---
 Changelog                |   1 +
 libavcodec/codec_desc.c  |   3 +-
 libavcodec/version.h     |   2 +-
 libavcodec/webp.c        | 710 +++++++++++++++++++++++++++++++++++----
 libavcodec/webp.h        |  44 +++
 libavcodec/webp_parser.c | 132 +++++---
 6 files changed, 785 insertions(+), 107 deletions(-)
 create mode 100644 libavcodec/webp.h
diff --git a/Changelog b/Changelog
index cd8be931ef..886f91c5c8 100644
--- a/Changelog
+++ b/Changelog
@@ -22,6 +22,7 @@ version <next>:
 - MODS demuxer
 - PhotoCD decoder
 - MCA demuxer
+- animated WebP parser/decoder
 
 
 version 4.3:
diff --git a/libavcodec/codec_desc.c b/libavcodec/codec_desc.c
index 9e73dcba27..4685169b05 100644
--- a/libavcodec/codec_desc.c
+++ b/libavcodec/codec_desc.c
@@ -1251,8 +1251,7 @@ static const AVCodecDescriptor codec_descriptors[] = {
         .type      = AVMEDIA_TYPE_VIDEO,
         .name      = "webp",
         .long_name = NULL_IF_CONFIG_SMALL("WebP"),
-        .props     = AV_CODEC_PROP_INTRA_ONLY | AV_CODEC_PROP_LOSSY |
-                     AV_CODEC_PROP_LOSSLESS,
+        .props     = AV_CODEC_PROP_LOSSY | AV_CODEC_PROP_LOSSLESS,
         .mime_types= MT("image/webp"),
     },
     {
diff --git a/libavcodec/version.h b/libavcodec/version.h
index 4b221f96ad..547adb93cc 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -29,7 +29,7 @@
 
 #define LIBAVCODEC_VERSION_MAJOR  58
 #define LIBAVCODEC_VERSION_MINOR 105
-#define LIBAVCODEC_VERSION_MICRO 100
+#define LIBAVCODEC_VERSION_MICRO 101
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
                                                LIBAVCODEC_VERSION_MINOR, \
diff --git a/libavcodec/webp.c b/libavcodec/webp.c
index c6d0206846..015d14e297 100644
--- a/libavcodec/webp.c
+++ b/libavcodec/webp.c
@@ -35,12 +35,15 @@
  * Exif metadata
  * ICC profile
  *
+ * @author Josef Zlomek, Pexeso Inc. <josef@pex.com>
+ * Animation
+ *
  * Unimplemented:
- *   - Animation
  *   - XMP metadata
  */
 
 #include "libavutil/imgutils.h"
+#include "libavutil/colorspace.h"
 
 #define BITSTREAM_READER_LE
 #include "avcodec.h"
@@ -50,12 +53,7 @@
 #include "internal.h"
 #include "thread.h"
 #include "vp8.h"
-
-#define VP8X_FLAG_ANIMATION             0x02
-#define VP8X_FLAG_XMP_METADATA          0x04
-#define VP8X_FLAG_EXIF_METADATA         0x08
-#define VP8X_FLAG_ALPHA                 0x10
-#define VP8X_FLAG_ICC                   0x20
+#include "webp.h"
 
 #define MAX_PALETTE_SIZE                256
 #define MAX_CACHE_BITS                  11
@@ -188,6 +186,8 @@ typedef struct ImageContext {
 typedef struct WebPContext {
     VP8Context v;                       /* VP8 Context used for lossy decoding */
     GetBitContext gb;                   /* bitstream reader for main image chunk */
+    ThreadFrame canvas_frame;           /* ThreadFrame for canvas */
+    AVFrame *frame;                     /* AVFrame for decoded frame */
     AVFrame *alpha_frame;               /* AVFrame for alpha data decompressed from VP8L */
     AVCodecContext *avctx;              /* parent AVCodecContext */
     int initialized;                    /* set once the VP8 context is initialized */
@@ -198,9 +198,24 @@ typedef struct WebPContext {
     int alpha_data_size;                /* alpha chunk data size */
     int has_exif;                       /* set after an EXIF chunk has been processed */
     int has_iccp;                       /* set after an ICCP chunk has been processed */
-    int width;                          /* image width */
-    int height;                         /* image height */
-    int lossless;                       /* indicates lossless or lossy */
+    int vp8x_flags;                     /* global flags from VP8X chunk */
+    int canvas_width;                   /* canvas width */
+    int canvas_height;                  /* canvas height */
+    int anmf_flags;                     /* frame flags from ANMF chunk */
+    int width;                          /* frame width */
+    int height;                         /* frame height */
+    int pos_x;                          /* frame position X */
+    int pos_y;                          /* frame position Y */
+    int prev_anmf_flags;                /* previous frame flags from ANMF chunk */
+    int prev_width;                     /* previous frame width */
+    int prev_height;                    /* previous frame height */
+    int prev_pos_x;                     /* previous frame position X */
+    int prev_pos_y;                     /* previous frame position Y */
+    int await_progress;                 /* value of progress to wait for */
+    uint8_t background_argb[4];         /* background color in ARGB format */
+    uint8_t background_yuva[4];         /* background color in YUVA format */
+    const uint8_t *background_data[4];  /* "planes" for background color in YUVA format */
+    uint8_t transparent_yuva[4];        /* transparent black in YUVA format */
 
     int nb_transforms;                  /* number of transforms */
     enum TransformType transforms[4];   /* transformations used in the image, in order */
@@ -605,8 +620,7 @@ static int decode_entropy_coded_image(WebPContext *s, enum ImageRole role,
     img->frame->height = h;
 
     if (role == IMAGE_ROLE_ARGB && !img->is_alpha_primary) {
-        ThreadFrame pt = { .f = img->frame };
-        ret = ff_thread_get_buffer(s->avctx, &pt, 0);
+        ret = ff_get_buffer(s->avctx, img->frame, 0);
     } else
         ret = av_frame_get_buffer(img->frame, 1);
     if (ret < 0)
@@ -1100,7 +1114,7 @@ static int apply_color_indexing_transform(WebPContext *s)
     return 0;
 }
 
-static void update_canvas_size(AVCodecContext *avctx, int w, int h)
+static void update_frame_size(AVCodecContext *avctx, int w, int h)
 {
     WebPContext *s = avctx->priv_data;
     if (s->width && s->width != w) {
@@ -1123,7 +1137,6 @@ static int vp8_lossless_decode_frame(AVCodecContext *avctx, AVFrame *p,
     int w, h, ret, i, used;
 
     if (!is_alpha_chunk) {
-        s->lossless = 1;
         avctx->pix_fmt = AV_PIX_FMT_ARGB;
     }
 
@@ -1140,7 +1153,7 @@ static int vp8_lossless_decode_frame(AVCodecContext *avctx, AVFrame *p,
         w = get_bits(&s->gb, 14) + 1;
         h = get_bits(&s->gb, 14) + 1;
 
-        update_canvas_size(avctx, w, h);
+        update_frame_size(avctx, w, h);
 
         ret = ff_set_dimensions(avctx, s->width, s->height);
         if (ret < 0)
@@ -1338,7 +1351,6 @@ static int vp8_lossy_decode_frame(AVCodecContext *avctx, AVFrame *p,
         s->v.actually_webp = 1;
     }
     avctx->pix_fmt = s->has_alpha ? AV_PIX_FMT_YUVA420P : AV_PIX_FMT_YUV420P;
-    s->lossless = 0;
 
     if (data_size > INT_MAX) {
         av_log(avctx, AV_LOG_ERROR, "unsupported chunk size\n");
@@ -1356,7 +1368,7 @@ static int vp8_lossy_decode_frame(AVCodecContext *avctx, AVFrame *p,
     if (!*got_frame)
         return AVERROR_INVALIDDATA;
 
-    update_canvas_size(avctx, avctx->width, avctx->height);
+    update_frame_size(avctx, avctx->width, avctx->height);
 
     if (s->has_alpha) {
         ret = vp8_lossy_decode_alpha(avctx, p, s->alpha_data,
@@ -1367,41 +1379,428 @@ static int vp8_lossy_decode_frame(AVCodecContext *avctx, AVFrame *p,
     return ret;
 }
 
-static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
-                             AVPacket *avpkt)
+static int init_canvas_frame(WebPContext *s, int format, int key_frame)
 {
-    AVFrame * const p = data;
-    WebPContext *s = avctx->priv_data;
-    GetByteContext gb;
+    AVFrame *canvas = s->canvas_frame.f;
+    int height;
     int ret;
-    uint32_t chunk_type, chunk_size;
-    int vp8x_flags = 0;
 
-    s->avctx     = avctx;
-    s->width     = 0;
-    s->height    = 0;
-    *got_frame   = 0;
-    s->has_alpha = 0;
-    s->has_exif  = 0;
-    s->has_iccp  = 0;
-    bytestream2_init(&gb, avpkt->data, avpkt->size);
+    // canvas is needed only for animation
+    if (!(s->vp8x_flags & VP8X_FLAG_ANIMATION))
+        return 0;
 
-    if (bytestream2_get_bytes_left(&gb) < 12)
-        return AVERROR_INVALIDDATA;
+    // avoid init for non-key frames whose format and size did not change
+    if (!key_frame &&
+        canvas->data[0] &&
+        canvas->format == format &&
+        canvas->width  == s->canvas_width &&
+        canvas->height == s->canvas_height)
+        return 0;
 
-    if (bytestream2_get_le32(&gb) != MKTAG('R', 'I', 'F', 'F')) {
-        av_log(avctx, AV_LOG_ERROR, "missing RIFF tag\n");
-        return AVERROR_INVALIDDATA;
+    s->avctx->pix_fmt = format;
+    canvas->format    = format;
+    canvas->width     = s->canvas_width;
+    canvas->height    = s->canvas_height;
+
+    // VP8 decoder changed the width and height in AVCodecContext.
+    // Change it back to the canvas size.
+    ret = ff_set_dimensions(s->avctx, s->canvas_width, s->canvas_height);
+    if (ret < 0)
+        return ret;
+
+    ff_thread_release_buffer(s->avctx, &s->canvas_frame);
+    ret = ff_thread_get_buffer(s->avctx, &s->canvas_frame, AV_GET_BUFFER_FLAG_REF);
+    if (ret < 0)
+        return ret;
+
+    if (canvas->format == AV_PIX_FMT_ARGB) {
+        height = canvas->height;
+        memset(canvas->data[0], 0, height * canvas->linesize[0]);
+    } else /* if (canvas->format == AV_PIX_FMT_YUVA420P) */ {
+        const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(canvas->format);
+        for (int comp = 0; comp < desc->nb_components; comp++) {
+            int plane = desc->comp[comp].plane;
+
+            if (comp == 1 || comp == 2)
+                height = AV_CEIL_RSHIFT(canvas->height, desc->log2_chroma_h);
+            else
+                height = FFALIGN(canvas->height, 1 << desc->log2_chroma_h);
+
+            memset(canvas->data[plane], s->transparent_yuva[plane],
+                   height * canvas->linesize[plane]);
+        }
     }
 
-    chunk_size = bytestream2_get_le32(&gb);
-    if (bytestream2_get_bytes_left(&gb) < chunk_size)
-        return AVERROR_INVALIDDATA;
+    return 0;
+}
 
-    if (bytestream2_get_le32(&gb) != MKTAG('W', 'E', 'B', 'P')) {
-        av_log(avctx, AV_LOG_ERROR, "missing WEBP tag\n");
-        return AVERROR_INVALIDDATA;
+/*
+ * Blend src1 (foreground) and src2 (background) into dest, in ARGB format.
+ * width, height are the dimensions of src1
+ * pos_x, pos_y is the position in src2 and in dest
+ */
+static void blend_alpha_argb(uint8_t *dest_data[4], int dest_linesize[4],
+                             const uint8_t *src1_data[4], int src1_linesize[4],
+                             const uint8_t *src2_data[4], int src2_linesize[4],
+                             int src2_step[4],
+                             int width, int height, int pos_x, int pos_y)
+{
+    for (int y = 0; y < height; y++) {
+        const uint8_t *src1 = src1_data[0] + y * src1_linesize[0];
+        const uint8_t *src2 = src2_data[0] + (y + pos_y) * src2_linesize[0] + pos_x * src2_step[0];
+        uint8_t       *dest = dest_data[0] + (y + pos_y) * dest_linesize[0] + pos_x * sizeof(uint32_t);
+        for (int x = 0; x < width; x++) {
+            int src1_alpha = src1[0];
+            int src2_alpha = src2[0];
+
+            if (src1_alpha == 255) {
+                memcpy(dest, src1, sizeof(uint32_t));
+            } else if (src1_alpha + src2_alpha == 0) {
+                memset(dest, 0, sizeof(uint32_t));
+            } else {
+                int tmp_alpha = src2_alpha - ROUNDED_DIV(src1_alpha * src2_alpha, 255);
+                int blend_alpha = src1_alpha + tmp_alpha;
+
+                dest[0] = blend_alpha;
+                dest[1] = ROUNDED_DIV(src1[1] * src1_alpha + src2[1] * tmp_alpha, blend_alpha);
+                dest[2] = ROUNDED_DIV(src1[2] * src1_alpha + src2[2] * tmp_alpha, blend_alpha);
+                dest[3] = ROUNDED_DIV(src1[3] * src1_alpha + src2[3] * tmp_alpha, blend_alpha);
+            }
+            src1 += sizeof(uint32_t);
+            src2 += src2_step[0];
+            dest += sizeof(uint32_t);
+        }
     }
+}
+
+/*
+ * Blend src1 (foreground) and src2 (background) into dest, in YUVA format.
+ * width, height are the dimensions of src1
+ * pos_x, pos_y is the position in src2 and in dest
+ */
+static void blend_alpha_yuva(WebPContext *s,
+                             uint8_t *dest_data[4], int dest_linesize[4],
+                             const uint8_t *src1_data[4], int src1_linesize[4],
+                             int src1_format,
+                             const uint8_t *src2_data[4], int src2_linesize[4],
+                             int src2_step[4],
+                             int width, int height, int pos_x, int pos_y)
+{
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(src1_format);
+
+    int plane_y = desc->comp[0].plane;
+    int plane_u = desc->comp[1].plane;
+    int plane_v = desc->comp[2].plane;
+    int plane_a = desc->comp[3].plane;
+
+    // blend U & V planes first, because the later step may modify alpha plane
+    int w  = AV_CEIL_RSHIFT(width,  desc->log2_chroma_w);
+    int h  = AV_CEIL_RSHIFT(height, desc->log2_chroma_h);
+    int px = AV_CEIL_RSHIFT(pos_x,  desc->log2_chroma_w);
+    int py = AV_CEIL_RSHIFT(pos_y,  desc->log2_chroma_h);
+    int tile_w = 1 << desc->log2_chroma_w;
+    int tile_h = 1 << desc->log2_chroma_h;
+
+    for (int y = 0; y < h; y++) {
+        const uint8_t *src1_u = src1_data[plane_u] + y * src1_linesize[plane_u];
+        const uint8_t *src1_v = src1_data[plane_v] + y * src1_linesize[plane_v];
+        const uint8_t *src2_u = src2_data[plane_u] + (y + py) * src2_linesize[plane_u] + px * src2_step[plane_u];
+        const uint8_t *src2_v = src2_data[plane_v] + (y + py) * src2_linesize[plane_v] + px * src2_step[plane_v];
+        uint8_t       *dest_u = dest_data[plane_u] + (y + py) * dest_linesize[plane_u] + px;
+        uint8_t       *dest_v = dest_data[plane_v] + (y + py) * dest_linesize[plane_v] + px;
+        for (int x = 0; x < w; x++) {
+            // calculate the average alpha of the tile
+            int src1_alpha = 0;
+            int src2_alpha = 0;
+            for (int yy = 0; yy < tile_h; yy++) {
+                for (int xx = 0; xx < tile_w; xx++) {
+                    src1_alpha += src1_data[plane_a][(y * tile_h + yy) * src1_linesize[plane_a] +
+                                                     (x * tile_w + xx)];
+                    src2_alpha += src2_data[plane_a][((y + py) * tile_h + yy) * src2_linesize[plane_a] +
+                                                     ((x + px) * tile_w + xx) * src2_step[plane_a]];
+                }
+            }
+            src1_alpha = AV_CEIL_RSHIFT(src1_alpha, desc->log2_chroma_w + desc->log2_chroma_h);
+            src2_alpha = AV_CEIL_RSHIFT(src2_alpha, desc->log2_chroma_w + desc->log2_chroma_h);
+
+            if (src1_alpha == 255) {
+                *dest_u = *src1_u;
+                *dest_v = *src1_v;
+            } else if (src1_alpha + src2_alpha == 0) {
+                *dest_u = s->transparent_yuva[plane_u];
+                *dest_v = s->transparent_yuva[plane_v];
+            } else {
+                int tmp_alpha = src2_alpha - ROUNDED_DIV(src1_alpha * src2_alpha, 255);
+                int blend_alpha = src1_alpha + tmp_alpha;
+                *dest_u = ROUNDED_DIV(*src1_u * src1_alpha + *src2_u * tmp_alpha, blend_alpha);
+                *dest_v = ROUNDED_DIV(*src1_v * src1_alpha + *src2_v * tmp_alpha, blend_alpha);
+            }
+            src1_u++;
+            src1_v++;
+            src2_u += src2_step[plane_u];
+            src2_v += src2_step[plane_v];
+            dest_u++;
+            dest_v++;
+        }
+    }
+
+    // blend Y & A planes
+    for (int y = 0; y < height; y++) {
+        const uint8_t *src1_y = src1_data[plane_y] + y * src1_linesize[plane_y];
+        const uint8_t *src1_a = src1_data[plane_a] + y * src1_linesize[plane_a];
+        const uint8_t *src2_y = src2_data[plane_y] + (y + pos_y) * src2_linesize[plane_y] + pos_x * src2_step[plane_y];
+        const uint8_t *src2_a = src2_data[plane_a] + (y + pos_y) * src2_linesize[plane_a] + pos_x * src2_step[plane_a];
+        uint8_t       *dest_y = dest_data[plane_y] + (y + pos_y) * dest_linesize[plane_y] + pos_x;
+        uint8_t       *dest_a = dest_data[plane_a] + (y + pos_y) * dest_linesize[plane_a] + pos_x;
+        for (int x = 0; x < width; x++) {
+            int src1_alpha = *src1_a;
+            int src2_alpha = *src2_a;
+
+            if (src1_alpha == 255) {
+                *dest_y = *src1_y;
+                *dest_a = 255;
+            } else if (src1_alpha + src2_alpha == 0) {
+                *dest_y = s->transparent_yuva[plane_y];
+                *dest_a = 0;
+            } else {
+                int tmp_alpha = src2_alpha - ROUNDED_DIV(src1_alpha * src2_alpha, 255);
+                int blend_alpha = src1_alpha + tmp_alpha;
+                *dest_y = ROUNDED_DIV(*src1_y * src1_alpha + *src2_y * tmp_alpha, blend_alpha);
+                *dest_a = blend_alpha;
+            }
+            src1_y++;
+            src1_a++;
+            src2_y += src2_step[plane_y];
+            src2_a += src2_step[plane_a];
+            dest_y++;
+            dest_a++;
+        }
+    }
+}
+
+static int blend_frame_into_canvas(WebPContext *s)
+{
+    AVFrame *canvas = s->canvas_frame.f;
+    AVFrame *frame  = s->frame;
+    int ret;
+    int width, height;
+    int pos_x, pos_y;
+
+    ret = av_frame_copy_props(canvas, frame);
+    if (ret < 0)
+        return ret;
+
+    if ((s->anmf_flags & ANMF_BLENDING_METHOD) == ANMF_BLENDING_METHOD_OVERWRITE
+        || frame->format == AV_PIX_FMT_YUV420P) {
+        // do not blend, overwrite
+
+        if (canvas->format == AV_PIX_FMT_ARGB) {
+            width  = s->width;
+            height = s->height;
+            pos_x  = s->pos_x;
+            pos_y  = s->pos_y;
+
+            for (int y = 0; y < height; y++) {
+                const uint32_t *src = (uint32_t *) (frame->data[0] + y * frame->linesize[0]);
+                uint32_t *dst = (uint32_t *) (canvas->data[0] + (y + pos_y) * canvas->linesize[0]) + pos_x;
+                memcpy(dst, src, width * sizeof(uint32_t));
+            }
+        } else /* if (canvas->format == AV_PIX_FMT_YUVA420P) */ {
+            const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(frame->format);
+            int plane;
+
+            for (int comp = 0; comp < desc->nb_components; comp++) {
+                plane  = desc->comp[comp].plane;
+                width  = s->width;
+                height = s->height;
+                pos_x  = s->pos_x;
+                pos_y  = s->pos_y;
+                if (comp == 1 || comp == 2) {
+                    width  = AV_CEIL_RSHIFT(width,  desc->log2_chroma_w);
+                    height = AV_CEIL_RSHIFT(height, desc->log2_chroma_h);
+                    pos_x  = AV_CEIL_RSHIFT(pos_x,  desc->log2_chroma_w);
+                    pos_y  = AV_CEIL_RSHIFT(pos_y,  desc->log2_chroma_h);
+                }
+
+                for (int y = 0; y < height; y++) {
+                    const uint8_t *src = frame->data[plane] + y * frame->linesize[plane];
+                    uint8_t *dst = canvas->data[plane] + (y + pos_y) * canvas->linesize[plane] + pos_x;
+                    memcpy(dst, src, width);
+                }
+            }
+
+            if (desc->nb_components < 4) {
+                // frame does not have alpha, set alpha to 255
+                desc = av_pix_fmt_desc_get(canvas->format);
+                plane  = desc->comp[3].plane;
+                width  = s->width;
+                height = s->height;
+                pos_x  = s->pos_x;
+                pos_y  = s->pos_y;
+
+                for (int y = 0; y < height; y++) {
+                    uint8_t *dst = canvas->data[plane] + (y + pos_y) * canvas->linesize[plane] + pos_x;
+                    memset(dst, 255, width);
+                }
+            }
+        }
+    } else {
+        // alpha blending
+
+        if (canvas->format == AV_PIX_FMT_ARGB) {
+            int src2_step[4] = { sizeof(uint32_t) };
+            blend_alpha_argb(canvas->data, canvas->linesize,
+                             (const uint8_t **) frame->data, frame->linesize,
+                             (const uint8_t **) canvas->data, canvas->linesize,
+                             src2_step, s->width, s->height, s->pos_x, s->pos_y);
+        } else /* if (canvas->format == AV_PIX_FMT_YUVA420P) */ {
+            int src2_step[4] = { 1, 1, 1, 1 };
+            blend_alpha_yuva(s, canvas->data, canvas->linesize,
+                             (const uint8_t **) frame->data, frame->linesize,
+                             frame->format,
+                             (const uint8_t **) canvas->data, canvas->linesize,
+                             src2_step, s->width, s->height, s->pos_x, s->pos_y);
+        }
+    }
+
+    return 0;
+}
+
+static int copy_canvas_to_frame(WebPContext *s, AVFrame *frame, int key_frame)
+{
+    AVFrame *canvas = s->canvas_frame.f;
+    int ret;
+
+    // VP8 decoder changed the width and height in AVCodecContext.
+    // Change it back to the canvas size.
+    ret = ff_set_dimensions(s->avctx, canvas->width, canvas->height);
+    if (ret < 0)
+        return ret;
+
+    s->avctx->pix_fmt = canvas->format;
+    frame->format     = canvas->format;
+    frame->width      = canvas->width;
+    frame->height     = canvas->height;
+
+    ret = av_frame_get_buffer(frame, 0);
+    if (ret < 0)
+        return ret;
+
+    ret = av_frame_copy_props(frame, canvas);
+    if (ret < 0)
+        return ret;
+
+    // blend the canvas with the background color into the output frame
+    if (canvas->format == AV_PIX_FMT_ARGB) {
+        int src2_step[4] = { 0 };
+        const uint8_t *src2_data[4] = { &s->background_argb[0] };
+        blend_alpha_argb(frame->data, frame->linesize,
+                         (const uint8_t **) canvas->data, canvas->linesize,
+                         (const uint8_t **) src2_data, src2_step, src2_step,
+                         canvas->width, canvas->height, 0, 0);
+    } else /* if (canvas->format == AV_PIX_FMT_YUVA420P) */ {
+        int src2_step[4] = { 0, 0, 0, 0 };
+        blend_alpha_yuva(s, frame->data, frame->linesize,
+                         (const uint8_t **) canvas->data, canvas->linesize,
+                         canvas->format,
+                         s->background_data, src2_step, src2_step,
+                         canvas->width, canvas->height, 0, 0);
+    }
+
+    if (key_frame) {
+        frame->pict_type = AV_PICTURE_TYPE_I;
+        frame->key_frame = 1;
+    } else {
+        frame->pict_type = AV_PICTURE_TYPE_P;
+        frame->key_frame = 0;
+    }
+
+    return 0;
+}
+
+static int dispose_prev_frame_in_canvas(WebPContext *s)
+{
+    AVFrame *canvas = s->canvas_frame.f;
+    int width, height;
+    int pos_x, pos_y;
+
+    if ((s->prev_anmf_flags & ANMF_DISPOSAL_METHOD) == ANMF_DISPOSAL_METHOD_BACKGROUND) {
+        // dispose to background
+
+        if (canvas->format == AV_PIX_FMT_ARGB) {
+            width  = s->prev_width;
+            height = s->prev_height;
+            pos_x  = s->prev_pos_x;
+            pos_y  = s->prev_pos_y;
+
+            for (int y = 0; y < height; y++) {
+                uint32_t *dst = (uint32_t *) (canvas->data[0] + (y + pos_y) * canvas->linesize[0]) + pos_x;
+                memset(dst, 0, width * sizeof(uint32_t));
+            }
+        } else /* if (canvas->format == AV_PIX_FMT_YUVA420P) */ {
+            const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(canvas->format);
+            int plane;
+
+            for (int comp = 0; comp < desc->nb_components; comp++) {
+                plane  = desc->comp[comp].plane;
+                width  = s->prev_width;
+                height = s->prev_height;
+                pos_x  = s->prev_pos_x;
+                pos_y  = s->prev_pos_y;
+                if (comp == 1 || comp == 2) {
+                    width  = AV_CEIL_RSHIFT(width,  desc->log2_chroma_w);
+                    height = AV_CEIL_RSHIFT(height, desc->log2_chroma_h);
+                    pos_x  = AV_CEIL_RSHIFT(pos_x,  desc->log2_chroma_w);
+                    pos_y  = AV_CEIL_RSHIFT(pos_y,  desc->log2_chroma_h);
+                }
+
+                for (int y = 0; y < height; y++) {
+                    uint8_t *dst = canvas->data[plane] + (y + pos_y) * canvas->linesize[plane] + pos_x;
+                    memset(dst, s->transparent_yuva[plane], width);
+                }
+            }
+        }
+    }
+
+    return 0;
+}
+
+static av_cold int webp_decode_init(AVCodecContext *avctx)
+{
+    WebPContext *s = avctx->priv_data;
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(AV_PIX_FMT_YUVA420P);
+
+    s->avctx = avctx;
+    s->canvas_frame.f = av_frame_alloc();
+    s->frame = av_frame_alloc();
+    if (!s->canvas_frame.f || !s->frame) {
+        av_frame_free(&s->canvas_frame.f);
+        av_frame_free(&s->frame);
+        return AVERROR(ENOMEM);
+    }
+
+    // prepare data pointers for YUVA background
+    for (int i = 0; i < 4; i++)
+        s->background_data[i] = &s->background_yuva[i];
+
+    // convert transparent black from RGBA to YUVA
+    s->transparent_yuva[desc->comp[0].plane] = RGB_TO_Y_CCIR(0, 0, 0);
+    s->transparent_yuva[desc->comp[1].plane] = RGB_TO_U_CCIR(0, 0, 0, 0);
+    s->transparent_yuva[desc->comp[2].plane] = RGB_TO_V_CCIR(0, 0, 0, 0);
+    s->transparent_yuva[desc->comp[3].plane] = 0;
+
+    return 0;
+}
+
+static int decode_frame_common(AVCodecContext *avctx, uint8_t *data, int size,
+                               int *got_frame, int key_frame)
+{
+    WebPContext *s = avctx->priv_data;
+    GetByteContext gb;
+    int ret;
+    uint32_t chunk_type, chunk_size;
+
+    bytestream2_init(&gb, data, size);
 
     while (bytestream2_get_bytes_left(&gb) > 8) {
         char chunk_str[5] = { 0 };
@@ -1412,6 +1811,10 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
             return AVERROR_INVALIDDATA;
         chunk_size += chunk_size & 1;
 
+        // we need to dive into RIFF chunk
+        if (chunk_type == MKTAG('R', 'I', 'F', 'F'))
+            chunk_size = 4;
+
         if (bytestream2_get_bytes_left(&gb) < chunk_size) {
            /* we seem to be running out of data, but it could also be that the
               bitstream has trailing junk leading to bogus chunk_size. */
@@ -1419,10 +1822,26 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
         }
 
         switch (chunk_type) {
+        case MKTAG('R', 'I', 'F', 'F'):
+            if (bytestream2_get_le32(&gb) != MKTAG('W', 'E', 'B', 'P')) {
+                av_log(avctx, AV_LOG_ERROR, "missing WEBP tag\n");
+                return AVERROR_INVALIDDATA;
+            }
+            s->vp8x_flags    = 0;
+            s->canvas_width  = 0;
+            s->canvas_height = 0;
+            s->has_exif      = 0;
+            s->has_iccp      = 0;
+            ff_thread_release_buffer(avctx, &s->canvas_frame);
+            break;
         case MKTAG('V', 'P', '8', ' '):
             if (!*got_frame) {
-                ret = vp8_lossy_decode_frame(avctx, p, got_frame,
-                                             avpkt->data + bytestream2_tell(&gb),
+                ret = init_canvas_frame(s, AV_PIX_FMT_YUVA420P, key_frame);
+                if (ret < 0)
+                    return ret;
+
+                ret = vp8_lossy_decode_frame(avctx, s->frame, got_frame,
+                                             data + bytestream2_tell(&gb),
                                              chunk_size);
                 if (ret < 0)
                     return ret;
@@ -1431,8 +1850,13 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
             break;
         case MKTAG('V', 'P', '8', 'L'):
             if (!*got_frame) {
-                ret = vp8_lossless_decode_frame(avctx, p, got_frame,
-                                                avpkt->data + bytestream2_tell(&gb),
+                ret = init_canvas_frame(s, AV_PIX_FMT_ARGB, key_frame);
+                if (ret < 0)
+                    return ret;
+                ff_thread_finish_setup(s->avctx);
+
+                ret = vp8_lossless_decode_frame(avctx, s->frame, got_frame,
+                                                data + bytestream2_tell(&gb),
                                                 chunk_size, 0);
                 if (ret < 0)
                     return ret;
@@ -1441,14 +1865,16 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
             bytestream2_skip(&gb, chunk_size);
             break;
         case MKTAG('V', 'P', '8', 'X'):
-            if (s->width || s->height || *got_frame) {
+            if (s->canvas_width || s->canvas_height || *got_frame) {
                 av_log(avctx, AV_LOG_ERROR, "Canvas dimensions are already set\n");
                 return AVERROR_INVALIDDATA;
             }
-            vp8x_flags = bytestream2_get_byte(&gb);
+            s->vp8x_flags = bytestream2_get_byte(&gb);
             bytestream2_skip(&gb, 3);
             s->width  = bytestream2_get_le24(&gb) + 1;
             s->height = bytestream2_get_le24(&gb) + 1;
+            s->canvas_width  = s->width;
+            s->canvas_height = s->height;
             ret = av_image_check_size(s->width, s->height, 0, avctx);
             if (ret < 0)
                 return ret;
@@ -1456,7 +1882,7 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
         case MKTAG('A', 'L', 'P', 'H'): {
             int alpha_header, filter_m, compression;
 
-            if (!(vp8x_flags & VP8X_FLAG_ALPHA)) {
+            if (!(s->vp8x_flags & VP8X_FLAG_ALPHA)) {
                 av_log(avctx, AV_LOG_WARNING,
                        "ALPHA chunk present, but alpha bit not set in the "
                        "VP8X header\n");
@@ -1465,8 +1891,9 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
                 av_log(avctx, AV_LOG_ERROR, "invalid ALPHA chunk size\n");
                 return AVERROR_INVALIDDATA;
             }
+
             alpha_header       = bytestream2_get_byte(&gb);
-            s->alpha_data      = avpkt->data + bytestream2_tell(&gb);
+            s->alpha_data      = data + bytestream2_tell(&gb);
             s->alpha_data_size = chunk_size - 1;
             bytestream2_skip(&gb, s->alpha_data_size);
 
@@ -1493,14 +1920,13 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
                 av_log(avctx, AV_LOG_VERBOSE, "Ignoring extra EXIF chunk\n");
                 goto exif_end;
             }
-            if (!(vp8x_flags & VP8X_FLAG_EXIF_METADATA))
+            if (!(s->vp8x_flags & VP8X_FLAG_EXIF_METADATA))
                 av_log(avctx, AV_LOG_WARNING,
                        "EXIF chunk present, but Exif bit not set in the "
                        "VP8X header\n");
 
             s->has_exif = 1;
-            bytestream2_init(&exif_gb, avpkt->data + exif_offset,
-                             avpkt->size - exif_offset);
+            bytestream2_init(&exif_gb, data + exif_offset, size - exif_offset);
             if (ff_tdecode_header(&exif_gb, &le, &ifd_offset) < 0) {
                 av_log(avctx, AV_LOG_ERROR, "invalid TIFF header "
                        "in Exif data\n");
@@ -1513,7 +1939,7 @@ static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
                 goto exif_end;
             }
 
-            av_dict_copy(&((AVFrame *) data)->metadata, exif_metadata, 0);
+            av_dict_copy(&s->frame->metadata, exif_metadata, 0);
 
 exif_end:
             av_dict_free(&exif_metadata);
@@ -1528,21 +1954,64 @@ exif_end:
                 bytestream2_skip(&gb, chunk_size);
                 break;
             }
-            if (!(vp8x_flags & VP8X_FLAG_ICC))
+            if (!(s->vp8x_flags & VP8X_FLAG_ICC))
                 av_log(avctx, AV_LOG_WARNING,
                        "ICCP chunk present, but ICC Profile bit not set in the "
                        "VP8X header\n");
 
             s->has_iccp = 1;
-            sd = av_frame_new_side_data(p, AV_FRAME_DATA_ICC_PROFILE, chunk_size);
+            sd = av_frame_new_side_data(s->frame, AV_FRAME_DATA_ICC_PROFILE, chunk_size);
             if (!sd)
                 return AVERROR(ENOMEM);
 
             bytestream2_get_buffer(&gb, sd->data, chunk_size);
             break;
         }
-        case MKTAG('A', 'N', 'I', 'M'):
+        case MKTAG('A', 'N', 'I', 'M'): {
+            const AVPixFmtDescriptor *desc;
+            int a, r, g, b;
+            if (!(s->vp8x_flags & VP8X_FLAG_ANIMATION)) {
+                av_log(avctx, AV_LOG_WARNING,
+                       "ANIM chunk present, but animation bit not set in the "
+                       "VP8X header\n");
+            }
+            // background is stored as BGRA, we need ARGB
+            s->background_argb[3] = b = bytestream2_get_byte(&gb);
+            s->background_argb[2] = g = bytestream2_get_byte(&gb);
+            s->background_argb[1] = r = bytestream2_get_byte(&gb);
+            s->background_argb[0] = a = bytestream2_get_byte(&gb);
+
+            // convert the background color to YUVA
+            desc = av_pix_fmt_desc_get(AV_PIX_FMT_YUVA420P);
+            s->background_yuva[desc->comp[0].plane] = RGB_TO_Y_CCIR(r, g, b);
+            s->background_yuva[desc->comp[1].plane] = RGB_TO_U_CCIR(r, g, b, 0);
+            s->background_yuva[desc->comp[2].plane] = RGB_TO_V_CCIR(r, g, b, 0);
+            s->background_yuva[desc->comp[3].plane] = a;
+
+            bytestream2_skip(&gb, 2); // loop count is ignored
+            break;
+        }
         case MKTAG('A', 'N', 'M', 'F'):
+            if (!(s->vp8x_flags & VP8X_FLAG_ANIMATION)) {
+                av_log(avctx, AV_LOG_WARNING,
+                       "ANMF chunk present, but animation bit not set in the "
+                       "VP8X header\n");
+            }
+            s->pos_x      = bytestream2_get_le24(&gb) * 2;
+            s->pos_y      = bytestream2_get_le24(&gb) * 2;
+            s->width      = bytestream2_get_le24(&gb) + 1;
+            s->height     = bytestream2_get_le24(&gb) + 1;
+            bytestream2_skip(&gb, 3);   // duration
+            s->anmf_flags = bytestream2_get_byte(&gb);
+
+            if (s->width  + s->pos_x > s->canvas_width ||
+                s->height + s->pos_y > s->canvas_height) {
+                av_log(avctx, AV_LOG_ERROR,
+                       "frame does not fit into canvas\n");
+                return AVERROR_INVALIDDATA;
+            }
+            s->vp8x_flags |= VP8X_FLAG_ANIMATION;
+            break;
         case MKTAG('X', 'M', 'P', ' '):
             AV_WL32(chunk_str, chunk_type);
             av_log(avctx, AV_LOG_WARNING, "skipping unsupported chunk: %s\n",
@@ -1558,31 +2027,148 @@ exif_end:
         }
     }
 
-    if (!*got_frame) {
-        av_log(avctx, AV_LOG_ERROR, "image data not found\n");
-        return AVERROR_INVALIDDATA;
+    return size;
+}
+
+static int webp_decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
+                             AVPacket *avpkt)
+{
+    AVFrame * const p = data;
+    WebPContext *s = avctx->priv_data;
+    int ret;
+    int key_frame = avpkt->flags & AV_PKT_FLAG_KEY;
+
+    for (int i = 0; i < avpkt->side_data_elems; ++i) {
+        if (avpkt->side_data[i].type == AV_PKT_DATA_NEW_EXTRADATA) {
+            ret = decode_frame_common(avctx, avpkt->side_data[i].data,
+                                      avpkt->side_data[i].size,
+                                      got_frame, key_frame);
+            if (ret < 0)
+                goto end;
+        }
+    }
+
+    *got_frame   = 0;
+
+    if (key_frame) {
+        // The canvas is passed from one thread to another in a sequence
+        // starting with a key frame followed by non-key frames.
+        // The key frame reports progress 1,
+        // the N-th non-key frame awaits progress N = s->await_progress
+        // and reports progress N + 1.
+        s->await_progress = 0;
     }
 
-    return avpkt->size;
+    // reset the frame params
+    s->anmf_flags = 0;
+    s->width      = 0;
+    s->height     = 0;
+    s->pos_x      = 0;
+    s->pos_y      = 0;
+    s->has_alpha  = 0;
+
+    ret = decode_frame_common(avctx, avpkt->data, avpkt->size, got_frame, key_frame);
+    if (ret < 0)
+        goto end;
+
+    if (*got_frame) {
+        if (!(s->vp8x_flags & VP8X_FLAG_ANIMATION)) {
+            // no animation, output the decoded frame
+            av_frame_move_ref(p, s->frame);
+        } else {
+            if (!key_frame) {
+                ff_thread_await_progress(&s->canvas_frame, s->await_progress, 0);
+
+                ret = dispose_prev_frame_in_canvas(s);
+                if (ret < 0)
+                    goto end;
+            }
+
+            ret = blend_frame_into_canvas(s);
+            if (ret < 0)
+                goto end;
+
+            ret = copy_canvas_to_frame(s, p, key_frame);
+            if (ret < 0)
+                goto end;
+
+            ff_thread_report_progress(&s->canvas_frame, s->await_progress + 1, 0);
+        }
+
+        p->pts = avpkt->pts;
+    }
+
+    ret = avpkt->size;
+
+end:
+    av_frame_unref(s->frame);
+    return ret;
 }
 
 static av_cold int webp_decode_close(AVCodecContext *avctx)
 {
     WebPContext *s = avctx->priv_data;
 
+    ff_thread_release_buffer(avctx, &s->canvas_frame);
+    av_frame_free(&s->canvas_frame.f);
+    av_frame_free(&s->frame);
+
     if (s->initialized)
         return ff_vp8_decode_free(avctx);
 
     return 0;
 }
 
+static void webp_decode_flush(AVCodecContext *avctx)
+{
+    WebPContext *s = avctx->priv_data;
+
+    ff_thread_release_buffer(avctx, &s->canvas_frame);
+}
+
+#if HAVE_THREADS
+static int webp_update_thread_context(AVCodecContext *dst, const AVCodecContext *src)
+{
+    WebPContext *wsrc = src->priv_data;
+    WebPContext *wdst = dst->priv_data;
+    int ret;
+
+    if (dst == src)
+        return 0;
+
+    ff_thread_release_buffer(dst, &wdst->canvas_frame);
+    if (wsrc->canvas_frame.f->data[0] &&
+        (ret = ff_thread_ref_frame(&wdst->canvas_frame, &wsrc->canvas_frame)) < 0)
+        return ret;
+
+    wdst->vp8x_flags      = wsrc->vp8x_flags;
+    wdst->canvas_width    = wsrc->canvas_width;
+    wdst->canvas_height   = wsrc->canvas_height;
+    wdst->prev_anmf_flags = wsrc->anmf_flags;
+    wdst->prev_width      = wsrc->width;
+    wdst->prev_height     = wsrc->height;
+    wdst->prev_pos_x      = wsrc->pos_x;
+    wdst->prev_pos_y      = wsrc->pos_y;
+    wdst->await_progress  = wsrc->await_progress + 1;
+
+    memcpy(wdst->background_argb,  wsrc->background_argb,  sizeof(wsrc->background_argb));
+    memcpy(wdst->background_yuva,  wsrc->background_yuva,  sizeof(wsrc->background_yuva));
+
+    return 0;
+}
+#endif
+
 AVCodec ff_webp_decoder = {
     .name           = "webp",
     .long_name      = NULL_IF_CONFIG_SMALL("WebP image"),
     .type           = AVMEDIA_TYPE_VIDEO,
     .id             = AV_CODEC_ID_WEBP,
     .priv_data_size = sizeof(WebPContext),
+    .update_thread_context = ONLY_IF_THREADS_ENABLED(webp_update_thread_context),
+    .init           = webp_decode_init,
     .decode         = webp_decode_frame,
     .close          = webp_decode_close,
+    .flush          = webp_decode_flush,
     .capabilities   = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
+    .caps_internal  = FF_CODEC_CAP_ALLOCATE_PROGRESS,
 };
diff --git a/libavcodec/webp.h b/libavcodec/webp.h
new file mode 100644
index 0000000000..ad9f1e23b2
--- /dev/null
+++ b/libavcodec/webp.h
@@ -0,0 +1,44 @@
+/*
+ * WebP image format definitions
+ * Copyright (c) 2020 Pexeso Inc.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * WebP image format definitions.
+ */
+
+#ifndef AVCODEC_WEBP_H
+#define AVCODEC_WEBP_H
+
+#define VP8X_FLAG_ANIMATION             0x02
+#define VP8X_FLAG_XMP_METADATA          0x04
+#define VP8X_FLAG_EXIF_METADATA         0x08
+#define VP8X_FLAG_ALPHA                 0x10
+#define VP8X_FLAG_ICC                   0x20
+
+#define ANMF_DISPOSAL_METHOD            0x01
+#define ANMF_DISPOSAL_METHOD_UNCHANGED  0x00
+#define ANMF_DISPOSAL_METHOD_BACKGROUND 0x01
+
+#define ANMF_BLENDING_METHOD            0x02
+#define ANMF_BLENDING_METHOD_ALPHA      0x00
+#define ANMF_BLENDING_METHOD_OVERWRITE  0x02
+
+#endif /* AVCODEC_WEBP_H */
diff --git a/libavcodec/webp_parser.c b/libavcodec/webp_parser.c
index fdb7c38350..1cb45104a1 100644
--- a/libavcodec/webp_parser.c
+++ b/libavcodec/webp_parser.c
@@ -25,13 +25,17 @@
 
 #include "libavutil/bswap.h"
 #include "libavutil/common.h"
+#include "libavutil/intreadwrite.h"
 
 #include "parser.h"
 
 typedef struct WebPParseContext {
     ParseContext pc;
+    int frame;
+    int first_frame;
     uint32_t fsize;
-    uint32_t remaining_size;
+    uint32_t remaining_file_size;
+    uint32_t remaining_tag_size;
 } WebPParseContext;
 
 static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx,
@@ -41,62 +45,106 @@ static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx,
     WebPParseContext *ctx = s->priv_data;
     uint64_t state = ctx->pc.state64;
     int next = END_NOT_FOUND;
-    int i = 0;
+    int i, len;
 
-    *poutbuf      = NULL;
-    *poutbuf_size = 0;
-
-restart:
-    if (ctx->pc.frame_start_found <= 8) {
-        for (; i < buf_size; i++) {
+    for (i = 0; i < buf_size;) {
+        if (ctx->remaining_tag_size) {
+            /* consuming tag */
+            len = FFMIN(ctx->remaining_tag_size, buf_size - i);
+            i += len;
+            ctx->remaining_tag_size -= len;
+            ctx->remaining_file_size -= len;
+        } else {
+            /* scan for the next tag or file */
             state = (state << 8) | buf[i];
-            if (ctx->pc.frame_start_found == 0) {
-                if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) {
-                    ctx->fsize = av_bswap32(state);
-                    if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) {
-                        ctx->pc.frame_start_found = 1;
-                        ctx->fsize += 8;
+            i++;
+
+            if (!ctx->remaining_file_size) {
+                /* scan for the next file */
+                if (ctx->pc.frame_start_found == 4) {
+                    ctx->pc.frame_start_found = 0;
+                    if ((uint32_t) state == MKBETAG('W', 'E', 'B', 'P')) {
+                        if (ctx->frame || i != 12) {
+                            ctx->frame = 0;
+                            next = i - 12;
+                            state = 0;
+                            ctx->pc.frame_start_found = 0;
+                            break;
+                        }
+                        ctx->remaining_file_size = ctx->fsize - 4;
+                        ctx->first_frame = 1;
+                        continue;
                     }
                 }
-            } else if (ctx->pc.frame_start_found == 8) {
-                if ((state >> 32) != MKBETAG('W', 'E', 'B', 'P')) {
+                if (ctx->pc.frame_start_found == 0) {
+                    if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) {
+                        ctx->fsize = av_bswap32(state);
+                        if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) {
+                            ctx->fsize += (ctx->fsize & 1);
+                            ctx->pc.frame_start_found = 1;
+                        }
+                    }
+                } else
+                    ctx->pc.frame_start_found++;
+            } else {
+                /* read the next tag */
+                ctx->remaining_file_size--;
+                if (ctx->remaining_file_size == 0) {
                     ctx->pc.frame_start_found = 0;
                     continue;
                 }
                 ctx->pc.frame_start_found++;
-                ctx->remaining_size = ctx->fsize + i - 15;
-                if (ctx->pc.index + i > 15) {
-                    next = i - 15;
-                    state = 0;
-                    break;
-                } else {
-                    ctx->pc.state64 = 0;
-                    goto restart;
+                if (ctx->pc.frame_start_found < 8)
+                    continue;
+
+                switch (state >> 32) {
+                    case MKBETAG('A', 'N', 'M', 'F'):
+                    case MKBETAG('V', 'P', '8', ' '):
+                    case MKBETAG('V', 'P', '8', 'L'):
+                        if (ctx->frame) {
+                            ctx->frame = 0;
+                            next = i - 8;
+                            state = 0;
+                            ctx->pc.frame_start_found = 0;
+                            goto flush;
+                        }
+                        ctx->frame = 1;
+                        break;
+                    default:
+                        break;
                 }
-            } else if (ctx->pc.frame_start_found)
-                ctx->pc.frame_start_found++;
-        }
-        ctx->pc.state64 = state;
-    } else {
-        if (ctx->remaining_size) {
-            i = FFMIN(ctx->remaining_size, buf_size);
-            ctx->remaining_size -= i;
-            if (ctx->remaining_size)
-                goto flush;
 
-            ctx->pc.frame_start_found = 0;
-            goto restart;
+                ctx->remaining_tag_size = av_bswap32(state);
+                ctx->remaining_tag_size += ctx->remaining_tag_size & 1;
+                if (ctx->remaining_tag_size > ctx->remaining_file_size) {
+                    /* this is probably trash at the end of file */
+                    ctx->remaining_tag_size = ctx->remaining_file_size;
+                }
+                ctx->pc.frame_start_found = 0;
+                state = 0;
+            }
         }
     }
-
 flush:
-    if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0)
+    ctx->pc.state64 = state;
+
+    if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0) {
+        *poutbuf      = NULL;
+        *poutbuf_size = 0;
         return buf_size;
+    }
 
-    if (next != END_NOT_FOUND && next < 0)
-        ctx->pc.frame_start_found = FFMAX(ctx->pc.frame_start_found - i - 1, 0);
-    else
-        ctx->pc.frame_start_found = 0;
+    // Extremely simplified key frame detection:
+    // - the first frame (containing headers) is marked as a key frame
+    // - other frames are marked as non-key frames
+    if (ctx->first_frame) {
+        ctx->first_frame = 0;
+        s->pict_type = AV_PICTURE_TYPE_I;
+        s->key_frame = 1;
+    } else {
+        s->pict_type = AV_PICTURE_TYPE_P;
+        s->key_frame = 0;
+    }
 
     *poutbuf      = buf;
     *poutbuf_size = buf_size;

From patchwork Fri Sep 11 06:36:13 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Zlomek, Josef" <josef@pex.com>
X-Patchwork-Id: 22284
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
X-Original-To: patchwork@ffaux-bg.ffmpeg.org
Delivered-To: patchwork@ffaux-bg.ffmpeg.org
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by ffaux.localdomain (Postfix) with ESMTP id 05C4F449560
	for <patchwork@ffaux-bg.ffmpeg.org>; Fri, 11 Sep 2020 09:36:36 +0300 (EEST)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DD45568BA02;
	Fri, 11 Sep 2020 09:36:35 +0300 (EEST)
X-Original-To: ffmpeg-devel@ffmpeg.org
Delivered-To: ffmpeg-devel@ffmpeg.org
Received: from vps.zlomek.net (vps.zlomek.net [195.181.211.90])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4D38868B9D9
 for <ffmpeg-devel@ffmpeg.org>; Fri, 11 Sep 2020 09:36:29 +0300 (EEST)
Received: by vps.zlomek.net (Postfix, from userid 1000)
 id E75FE5FE24; Fri, 11 Sep 2020 08:36:28 +0200 (CEST)
From: Josef Zlomek <josef@pex.com>
To: ffmpeg-devel@ffmpeg.org
Date: Fri, 11 Sep 2020 08:36:13 +0200
Message-Id: <20200911063613.4475-2-josef@pex.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: 
 <CAJFp-Et_-oZ=7VddUG6cCGtiQ7Rf3GZUW0HPR6v5pCjVveRVow@mail.gmail.com>
References: 
 <CAJFp-Et_-oZ=7VddUG6cCGtiQ7Rf3GZUW0HPR6v5pCjVveRVow@mail.gmail.com>
Subject: [FFmpeg-devel] [PATCH v5 2/2] libavformat/webp: add WebP demuxer
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Josef Zlomek <josef@pex.com>
MIME-Version: 1.0
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>

Adds the demuxer of animated WebP files.
It supports non-animated, animated, truncated, and concatenated files.
Reading from a pipe (and other non-seekable inputs) is also supported.

The WebP demuxer splits the input stream into packets containing one frame.
It also marks the key frames properly.
The loop count is ignored by default (same behaviour as animated PNG and GIF),
it may be enabled by the option '-ignore_loop 0'.

The frame rate is set according to the frame delay in the ANMF chunk.
If the delay is too low, or the image is not animated, the default frame rate
is set to 10 fps, similarly to other WebP libraries and browsers.
The fate suite was updated accordingly.

Signed-off-by: Josef Zlomek <josef@pex.com>
---
 Changelog                                   |   1 +
 doc/demuxers.texi                           |  28 +
 libavformat/Makefile                        |   1 +
 libavformat/allformats.c                    |   1 +
 libavformat/version.h                       |   2 +-
 libavformat/webpdec.c                       | 733 ++++++++++++++++++++
 tests/ref/fate/exif-image-webp              |   8 +-
 tests/ref/fate/webp-rgb-lena-lossless       |   2 +-
 tests/ref/fate/webp-rgb-lena-lossless-rgb24 |   2 +-
 tests/ref/fate/webp-rgb-lossless            |   2 +-
 tests/ref/fate/webp-rgb-lossy-q80           |   2 +-
 tests/ref/fate/webp-rgba-lossless           |   2 +-
 tests/ref/fate/webp-rgba-lossy-q80          |   2 +-
 13 files changed, 775 insertions(+), 11 deletions(-)
 create mode 100644 libavformat/webpdec.c

diff --git a/Changelog b/Changelog
index 886f91c5c8..9c7a26a80b 100644
--- a/Changelog
+++ b/Changelog
@@ -23,6 +23,7 @@ version <next>:
 - PhotoCD decoder
 - MCA demuxer
 - animated WebP parser/decoder
+- animated WebP demuxer
 
 
 version 4.3:
diff --git a/doc/demuxers.texi b/doc/demuxers.texi
index 3c15ab9eee..f935c49b48 100644
--- a/doc/demuxers.texi
+++ b/doc/demuxers.texi
@@ -832,4 +832,32 @@ which in turn, acts as a ceiling for the size of scripts that can be read.
 Default is 1 MiB.
 @end table
 
+@section webp
+
+Animated WebP demuxer.
+
+It accepts the following options:
+
+@table @option
+@item -min_delay @var{int}
+Set the minimum valid delay between frames in milliseconds.
+Range is 0 to 60000. Default value is 10.
+
+@item -max_webp_delay @var{int}
+Set the maximum valid delay between frames in milliseconds.
+Range is 0 to 16777215. Default value is 16777215 (over four hours),
+the maximum value allowed by the specification.
+
+@item -default_delay @var{int}
+Set the default delay between frames in milliseconds.
+Range is 0 to 60000. Default value is 100.
+
+@item -ignore_loop @var{bool}
+WebP files can contain information to loop a certain number of times
+(or infinitely). If @option{ignore_loop} is set to true, then the loop
+setting from the input will be ignored and looping will not occur.
+If set to false, then looping will occur and will cycle the number
+of times according to the WebP. Default value is true.
+@end table
+
 @c man end DEMUXERS
diff --git a/libavformat/Makefile b/libavformat/Makefile
index a1a2a7e0e1..3cd9d21171 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -564,6 +564,7 @@ OBJS-$(CONFIG_WEBM_MUXER)                += matroskaenc.o matroska.o \
                                             wv.o vorbiscomment.o
 OBJS-$(CONFIG_WEBM_DASH_MANIFEST_MUXER)  += webmdashenc.o
 OBJS-$(CONFIG_WEBM_CHUNK_MUXER)          += webm_chunk.o
+OBJS-$(CONFIG_WEBP_DEMUXER)              += webpdec.o
 OBJS-$(CONFIG_WEBP_MUXER)                += webpenc.o
 OBJS-$(CONFIG_WEBVTT_DEMUXER)            += webvttdec.o subtitles.o
 OBJS-$(CONFIG_WEBVTT_MUXER)              += webvttenc.o
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index 97e9be4269..bcb8d46ae6 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -461,6 +461,7 @@ extern AVOutputFormat ff_webm_muxer;
 extern AVInputFormat  ff_webm_dash_manifest_demuxer;
 extern AVOutputFormat ff_webm_dash_manifest_muxer;
 extern AVOutputFormat ff_webm_chunk_muxer;
+extern AVInputFormat  ff_webp_demuxer;
 extern AVOutputFormat ff_webp_muxer;
 extern AVInputFormat  ff_webvtt_demuxer;
 extern AVOutputFormat ff_webvtt_muxer;
diff --git a/libavformat/version.h b/libavformat/version.h
index 7771a6abf2..ec3286e73a 100644
--- a/libavformat/version.h
+++ b/libavformat/version.h
@@ -32,7 +32,7 @@
 // Major bumping may affect Ticket5467, 5421, 5451(compatibility with Chromium)
 // Also please add any ticket numbers that you believe might be affected here
 #define LIBAVFORMAT_VERSION_MAJOR  58
-#define LIBAVFORMAT_VERSION_MINOR  54
+#define LIBAVFORMAT_VERSION_MINOR  55
 #define LIBAVFORMAT_VERSION_MICRO 100
 
 #define LIBAVFORMAT_VERSION_INT AV_VERSION_INT(LIBAVFORMAT_VERSION_MAJOR, \
diff --git a/libavformat/webpdec.c b/libavformat/webpdec.c
new file mode 100644
index 0000000000..244042f2db
--- /dev/null
+++ b/libavformat/webpdec.c
@@ -0,0 +1,733 @@
+/*
+ * WebP demuxer
+ * Copyright (c) 2020 Pexeso Inc.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * WebP demuxer.
+ */
+
+#include "avformat.h"
+#include "avio_internal.h"
+#include "internal.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/opt.h"
+#include "libavcodec/webp.h"
+
+/**
+ * WebP headers (chunks before the first frame) and important info from them.
+ */
+typedef struct WebPHeaders {
+    int64_t offset;                 ///< offset in the (concatenated) file
+    uint8_t *data;                  ///< raw data
+    uint32_t size;                  ///< size of data
+    uint32_t webp_size;             ///< size of the WebP file
+    int canvas_width;               ///< width of the canvas
+    int canvas_height;              ///< height of the canvas
+    int num_loop;                   ///< number of times to loop the animation
+} WebPHeaders;
+
+typedef struct WebPDemuxContext {
+    const AVClass *class;
+    /**
+     * Time span in milliseconds before the next frame
+     * should be drawn on screen.
+     */
+    int delay;
+    /**
+     * Minimum allowed delay between frames in milliseconds.
+     * Values below this threshold are considered to be invalid
+     * and set to value of default_delay.
+     */
+    int min_delay;
+    int max_delay;
+    int default_delay;
+
+    /*
+     * loop options
+     */
+    int ignore_loop;                ///< ignore loop setting
+    int num_loop;                   ///< number of times to loop the animation
+    int cur_loop;                   ///< current loop counter
+    int64_t file_start;             ///< start position of the current animation file
+    int64_t infinite_loop_start;    ///< start position of the infinite loop
+
+    uint32_t remaining_size;        ///< remaining size of the current animation file
+    int64_t seekback_buffer_end;    ///< position of the end of the seek back buffer
+    int64_t prev_end_position;      ///< position after the previous packet
+    size_t num_webp_headers;        ///< number of (concatenated) WebP files' headers
+    WebPHeaders *webp_headers;      ///< (concatenated) WebP files' headers
+
+    /*
+     * variables for the key frame detection
+     */
+    int nb_frames;                  ///< number of frames of the current animation file
+    int canvas_width;               ///< width of the canvas
+    int canvas_height;              ///< height of the canvas
+    int prev_width;                 ///< width of the previous frame
+    int prev_height;                ///< height of the previous frame
+    int prev_anmf_flags;            ///< flags of the previous frame
+    int prev_key_frame;             ///< flag if the previous frame was a key frame
+} WebPDemuxContext;
+
+/**
+ * Major web browsers display WebPs at ~10-15fps when rate is not
+ * explicitly set or have too low values. We assume default rate to be 10.
+ * Default delay = 1000 microseconds / 10fps = 100 milliseconds per frame.
+ */
+#define WEBP_DEFAULT_DELAY   100
+/**
+ * By default delay values less than this threshold considered to be invalid.
+ */
+#define WEBP_MIN_DELAY       10
+
+static int webp_probe(const AVProbeData *p)
+{
+    const uint8_t *b = p->buf;
+
+    if (AV_RB32(b)     == MKBETAG('R', 'I', 'F', 'F') &&
+        AV_RB32(b + 8) == MKBETAG('W', 'E', 'B', 'P'))
+        return AVPROBE_SCORE_MAX;
+
+    return 0;
+}
+
+static int ensure_seekback(AVFormatContext *s, int64_t bytes)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+    AVIOContext      *pb  = s->pb;
+    int ret;
+
+    int64_t pos = avio_tell(pb);
+    if (pos < 0)
+        return pos;
+
+    if (pos + bytes <= wdc->seekback_buffer_end)
+        return 0;
+
+    if ((ret = ffio_ensure_seekback(pb, bytes)) < 0)
+        return ret;
+
+    wdc->seekback_buffer_end = pos + bytes;
+    return 0;
+}
+
+static int resync(AVFormatContext *s, int seek_to_start)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+    AVIOContext      *pb  = s->pb;
+    int ret;
+    int i;
+    uint64_t state = 0;
+
+    // ensure seek back for the file header and the first chunk header
+    if ((ret = ensure_seekback(s, 12 + 8)) < 0)
+        return ret;
+
+    for (i = 0; i < 12; i++) {
+        state = (state << 8) | avio_r8(pb);
+        if (i == 11) {
+            if ((uint32_t) state == MKBETAG('W', 'E', 'B', 'P'))
+                break;
+            i -= 4;
+        }
+        if (i == 7) {
+            // ensure seek back for the rest of file header and the chunk header
+            if ((ret = ensure_seekback(s, 4 + 8)) < 0)
+                return ret;
+
+            if ((state >> 32) != MKBETAG('R', 'I', 'F', 'F'))
+                i--;
+            else {
+                uint32_t fsize = av_bswap32(state);
+                if (!(fsize > 15 && fsize <= UINT32_MAX - 10))
+                    i -= 4;
+                else
+                    wdc->remaining_size = fsize - 4;
+            }
+        }
+        if (avio_feof(pb))
+            return AVERROR_EOF;
+    }
+
+    wdc->file_start = avio_tell(pb) - 12;
+
+    if (seek_to_start) {
+        if ((ret = avio_seek(pb, -12, SEEK_CUR)) < 0)
+            return ret;
+        wdc->remaining_size += 12;
+    }
+
+    return 0;
+}
+
+static int is_key_frame(AVFormatContext *s, int has_alpha, int anmf_flags,
+                        int width, int height)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+
+    if (wdc->nb_frames == 1)
+        return 1;
+
+    if (width  == wdc->canvas_width &&
+        height == wdc->canvas_height &&
+        (!has_alpha || (anmf_flags & ANMF_BLENDING_METHOD) == ANMF_BLENDING_METHOD_OVERWRITE))
+        return 1;
+
+    if ((wdc->prev_anmf_flags & ANMF_DISPOSAL_METHOD) == ANMF_DISPOSAL_METHOD_BACKGROUND &&
+        (wdc->prev_key_frame || (wdc->prev_width  == wdc->canvas_width &&
+                                 wdc->prev_height == wdc->canvas_height)))
+        return 1;
+
+    return 0;
+}
+
+static int webp_read_header(AVFormatContext *s)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+    AVIOContext      *pb  = s->pb;
+    AVStream         *st;
+    int ret, n;
+    uint32_t chunk_type, chunk_size;
+    int canvas_width  = 0;
+    int canvas_height = 0;
+    int width         = 0;
+    int height        = 0;
+    int is_frame      = 0;
+
+    wdc->delay = wdc->default_delay;
+    wdc->num_loop = 1;
+    wdc->infinite_loop_start = -1;
+
+    if ((ret = resync(s, 0)) < 0)
+        return ret;
+
+    st = avformat_new_stream(s, NULL);
+    if (!st)
+        return AVERROR(ENOMEM);
+
+    while (!is_frame && wdc->remaining_size > 0 && !avio_feof(pb)) {
+        chunk_type = avio_rl32(pb);
+        chunk_size = avio_rl32(pb);
+        if (chunk_size == UINT32_MAX)
+            return AVERROR_INVALIDDATA;
+        chunk_size += chunk_size & 1;
+        if (avio_feof(pb))
+            break;
+
+        if (wdc->remaining_size < 8 + chunk_size)
+            return AVERROR_INVALIDDATA;
+        wdc->remaining_size -= 8 + chunk_size;
+
+        // ensure seek back for the chunk body and the next chunk header
+        if ((ret = ensure_seekback(s, chunk_size + 8)) < 0)
+            return ret;
+
+        switch (chunk_type) {
+        case MKTAG('V', 'P', '8', 'X'):
+            if (chunk_size >= 10) {
+                avio_skip(pb, 4);
+                canvas_width  = avio_rl24(pb) + 1;
+                canvas_height = avio_rl24(pb) + 1;
+                ret = avio_skip(pb, chunk_size - 10);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('V', 'P', '8', ' '):
+            if (chunk_size >= 10) {
+                avio_skip(pb, 6);
+                width  = avio_rl16(pb) & 0x3fff;
+                height = avio_rl16(pb) & 0x3fff;
+                is_frame = 1;
+                ret = avio_skip(pb, chunk_size - 10);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('V', 'P', '8', 'L'):
+            if (chunk_size >= 5) {
+                avio_skip(pb, 1);
+                n = avio_rl32(pb);
+                width  = (n & 0x3fff) + 1;          // first 14 bits
+                height = ((n >> 14) & 0x3fff) + 1;  // next 14 bits
+                is_frame = 1;
+                ret = avio_skip(pb, chunk_size - 5);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('A', 'N', 'M', 'F'):
+            if (chunk_size >= 12) {
+                avio_skip(pb, 6);
+                width  = avio_rl24(pb) + 1;
+                height = avio_rl24(pb) + 1;
+                is_frame = 1;
+                ret = avio_skip(pb, chunk_size - 12);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        default:
+            ret = avio_skip(pb, chunk_size);
+            break;
+        }
+
+        if (ret < 0)
+            return ret;
+
+        // fallback if VP8X chunk was not present
+        if (!canvas_width && width > 0)
+            canvas_width = width;
+        if (!canvas_height && height > 0)
+            canvas_height = height;
+    }
+
+    // WebP format operates with time in "milliseconds", therefore timebase is 1/100
+    avpriv_set_pts_info(st, 64, 1, 1000);
+    st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
+    st->codecpar->codec_id   = AV_CODEC_ID_WEBP;
+    st->codecpar->codec_tag  = MKTAG('W', 'E', 'B', 'P');
+    st->codecpar->width      = canvas_width;
+    st->codecpar->height     = canvas_height;
+    st->start_time           = 0;
+
+    // jump to start because WebP decoder needs header data too
+    if ((ret = avio_seek(pb, wdc->file_start, SEEK_SET)) < 0)
+        return ret;
+    wdc->remaining_size = 0;
+
+    return 0;
+}
+
+static WebPHeaders *webp_headers_lower_or_equal(WebPHeaders *headers, size_t n,
+                                                int64_t offset)
+{
+    size_t s, e;
+
+    if (n == 0)
+        return NULL;
+    if (headers[0].offset > offset)
+        return NULL;
+
+    s = 0;
+    e = n - 1;
+    while (s < e) {
+        size_t mid = (s + e + 1) / 2;
+        if (headers[mid].offset == offset)
+            return &headers[mid];
+        else if (headers[mid].offset > offset)
+            e = mid - 1;
+        else
+            s = mid;
+    }
+
+    return &headers[s];
+}
+
+static int append_chunk(WebPHeaders *headers, AVIOContext *pb,
+                        uint32_t chunk_size)
+{
+    uint32_t previous_size = headers->size;
+    uint8_t *new_data;
+
+    if (headers->size > UINT32_MAX - chunk_size)
+        return AVERROR_INVALIDDATA;
+
+    new_data = av_realloc(headers->data, headers->size + chunk_size);
+    if (!new_data)
+        return AVERROR(ENOMEM);
+
+    headers->data = new_data;
+    headers->size += chunk_size;
+
+    return avio_read(pb, headers->data + previous_size, chunk_size);
+}
+
+static int webp_read_packet(AVFormatContext *s, AVPacket *pkt)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+    AVIOContext      *pb  = s->pb;
+    int ret, n;
+    int64_t packet_start = avio_tell(pb), packet_end;
+    uint32_t chunk_type, chunk_size;
+    int width = 0, height = 0;
+    int is_frame = 0;
+    int key_frame = 0;
+    int anmf_flags = 0;
+    int has_alpha = 0;
+    int reading_headers = 0;
+    int reset_key_frame = 0;
+    WebPHeaders *headers = NULL;
+
+    if (packet_start != wdc->prev_end_position) {
+        // seek occurred, find the corresponding WebP headers
+        headers = webp_headers_lower_or_equal(wdc->webp_headers, wdc->num_webp_headers,
+                                              packet_start);
+        if (!headers)
+            return AVERROR_BUG;
+
+        wdc->file_start     = headers->offset;
+        wdc->remaining_size = headers->webp_size - (packet_start - headers->offset);
+        wdc->canvas_width   = headers->canvas_width;
+        wdc->canvas_height  = headers->canvas_height;
+        wdc->num_loop       = headers->num_loop;
+        wdc->cur_loop       = 0;
+        reset_key_frame     = 1;
+    }
+
+    if (wdc->remaining_size == 0) {
+        // if the loop count is finite, loop the current animation
+        if (avio_tell(pb) != wdc->file_start &&
+            !wdc->ignore_loop && wdc->num_loop > 1 && ++wdc->cur_loop < wdc->num_loop) {
+            if ((ret = avio_seek(pb, wdc->file_start, SEEK_SET)) < 0)
+                return ret;
+            packet_start = avio_tell(pb);
+        } else {
+            // start of a new animation file
+            wdc->delay = wdc->default_delay;
+            if (wdc->num_loop)
+                wdc->num_loop = 1;
+        }
+
+        // resync to the start of the next file
+        ret = resync(s, 1);
+        if (ret == AVERROR_EOF) {
+            // we reached EOF, if the loop count is infinite, loop the whole input
+            if (!wdc->ignore_loop && !wdc->num_loop) {
+                if ((ret = avio_seek(pb, wdc->infinite_loop_start, SEEK_SET)) < 0)
+                    return ret;
+                ret = resync(s, 1);
+            } else {
+                wdc->prev_end_position = avio_tell(pb);
+                return AVERROR_EOF;
+            }
+        }
+        if (ret < 0)
+            return ret;
+        packet_start = avio_tell(pb);
+
+        reset_key_frame = 1;
+    }
+
+    if (reset_key_frame) {
+        // reset variables used for key frame detection
+        wdc->nb_frames       = 0;
+        wdc->canvas_width    = 0;
+        wdc->canvas_height   = 0;
+        wdc->prev_width      = 0;
+        wdc->prev_height     = 0;
+        wdc->prev_anmf_flags = 0;
+        wdc->prev_key_frame  = 0;
+    }
+
+    if (packet_start == wdc->file_start) {
+        headers = webp_headers_lower_or_equal(wdc->webp_headers, wdc->num_webp_headers,
+                                              packet_start);
+        if (!headers || headers->offset != wdc->file_start) {
+            // grow the array of WebP files' headers
+            wdc->num_webp_headers++;
+            wdc->webp_headers = av_realloc_f(wdc->webp_headers,
+                                             wdc->num_webp_headers,
+                                             sizeof(WebPHeaders));
+            if (!wdc->webp_headers)
+                return AVERROR(ENOMEM);
+
+            headers = &wdc->webp_headers[wdc->num_webp_headers - 1];
+            memset(headers, 0, sizeof(*headers));
+            headers->offset = wdc->file_start;
+        } else {
+            // headers for this WebP file have been already read, skip them
+            if ((ret = avio_seek(pb, headers->size, SEEK_CUR)) < 0)
+                return ret;
+            packet_start = avio_tell(pb);
+
+            wdc->remaining_size = headers->webp_size - headers->size;
+            wdc->canvas_width   = headers->canvas_width;
+            wdc->canvas_height  = headers->canvas_height;
+
+            if (wdc->cur_loop >= wdc->num_loop)
+                wdc->cur_loop = 0;
+            wdc->num_loop = headers->num_loop;
+        }
+    }
+
+    while (wdc->remaining_size > 0 && !avio_feof(pb)) {
+        chunk_type = avio_rl32(pb);
+        chunk_size = avio_rl32(pb);
+        if (chunk_size == UINT32_MAX)
+            return AVERROR_INVALIDDATA;
+        chunk_size += chunk_size & 1;
+
+        if (avio_feof(pb))
+            break;
+
+        // dive into RIFF chunk and do not ensure seek back for the whole file
+        if (chunk_type == MKTAG('R', 'I', 'F', 'F') && chunk_size > 4)
+            chunk_size = 4;
+
+        // ensure seek back for the chunk body and the next chunk header
+        if ((ret = ensure_seekback(s, chunk_size + 8)) < 0)
+            return ret;
+
+        switch (chunk_type) {
+        case MKTAG('R', 'I', 'F', 'F'):
+            if (avio_tell(pb) != wdc->file_start + 8) {
+                // premature RIFF found, shorten the file size
+                WebPHeaders *tmp = webp_headers_lower_or_equal(wdc->webp_headers,
+                                                               wdc->num_webp_headers,
+                                                               avio_tell(pb));
+                tmp->webp_size -= wdc->remaining_size;
+                wdc->remaining_size = 0;
+                goto flush;
+            }
+
+            reading_headers = 1;
+            if ((ret = avio_seek(pb, -8, SEEK_CUR)) < 0 ||
+                (ret = append_chunk(headers, pb, 8 + chunk_size)) < 0)
+                return ret;
+            packet_start = avio_tell(pb);
+
+            headers->offset = wdc->file_start;
+            headers->webp_size = 8 + AV_RL32(headers->data + headers->size - chunk_size - 4);
+            break;
+        case MKTAG('V', 'P', '8', 'X'):
+            reading_headers = 1;
+            if ((ret = avio_seek(pb, -8, SEEK_CUR)) < 0 ||
+                (ret = append_chunk(headers, pb, 8 + chunk_size)) < 0)
+                return ret;
+            packet_start = avio_tell(pb);
+
+            if (chunk_size >= 10) {
+                headers->canvas_width  = AV_RL24(headers->data + headers->size - chunk_size + 4) + 1;
+                headers->canvas_height = AV_RL24(headers->data + headers->size - chunk_size + 7) + 1;
+            }
+            break;
+        case MKTAG('A', 'N', 'I', 'M'):
+            reading_headers = 1;
+            if ((ret = avio_seek(pb, -8, SEEK_CUR)) < 0 ||
+                (ret = append_chunk(headers, pb, 8 + chunk_size)) < 0)
+                return ret;
+            packet_start = avio_tell(pb);
+
+            if (chunk_size >= 6) {
+                headers->num_loop = AV_RL16(headers->data + headers->size - chunk_size + 4);
+                wdc->num_loop = headers->num_loop;
+                wdc->cur_loop = 0;
+                if (!wdc->ignore_loop && wdc->num_loop != 1) {
+                    // ensure seek back for the rest of the file
+                    // and for the header of the next concatenated file
+                    uint32_t loop_end = wdc->remaining_size - chunk_size + 12;
+                    if ((ret = ensure_seekback(s, loop_end)) < 0)
+                        return ret;
+
+                    if (!wdc->num_loop && wdc->infinite_loop_start < 0)
+                        wdc->infinite_loop_start = wdc->file_start;
+                }
+            }
+            break;
+        case MKTAG('V', 'P', '8', ' '):
+            if (is_frame)
+                // found a start of the next non-animated frame
+                goto flush;
+            is_frame = 1;
+
+            reading_headers = 0;
+            if (chunk_size >= 10) {
+                avio_skip(pb, 6);
+                width  = avio_rl16(pb) & 0x3fff;
+                height = avio_rl16(pb) & 0x3fff;
+                wdc->nb_frames++;
+                ret = avio_skip(pb, chunk_size - 10);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('V', 'P', '8', 'L'):
+            if (is_frame)
+                // found a start of the next non-animated frame
+                goto flush;
+            is_frame = 1;
+
+            reading_headers = 0;
+            if (chunk_size >= 5) {
+                avio_skip(pb, 1);
+                n = avio_rl32(pb);
+                width     = (n & 0x3fff) + 1;           // first 14 bits
+                height    = ((n >> 14) & 0x3fff) + 1;   ///next 14 bits
+                has_alpha = (n >> 28) & 1;              // next 1 bit
+                wdc->nb_frames++;
+                ret = avio_skip(pb, chunk_size - 5);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('A', 'N', 'M', 'F'):
+            if (is_frame)
+                // found a start of the next animated frame
+                goto flush;
+
+            reading_headers = 0;
+            if (chunk_size >= 16) {
+                avio_skip(pb, 6);
+                width      = avio_rl24(pb) + 1;
+                height     = avio_rl24(pb) + 1;
+                wdc->delay = avio_rl24(pb);
+                anmf_flags = avio_r8(pb);
+                if (wdc->delay < wdc->min_delay)
+                    wdc->delay = wdc->default_delay;
+                wdc->delay = FFMIN(wdc->delay, wdc->max_delay);
+                // dive into the chunk to set the has_alpha flag
+                chunk_size = 16;
+                ret = 0;
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        case MKTAG('A', 'L', 'P', 'H'):
+            reading_headers = 0;
+            has_alpha = 1;
+            ret = avio_skip(pb, chunk_size);
+            break;
+        default:
+            if (reading_headers) {
+                if ((ret = avio_seek(pb, -8, SEEK_CUR)) < 0 ||
+                    (ret = append_chunk(headers, pb, 8 + chunk_size)) < 0)
+                    return ret;
+                packet_start = avio_tell(pb);
+            } else
+                ret = avio_skip(pb, chunk_size);
+            break;
+        }
+        if (ret == AVERROR_EOF) {
+            // EOF was reached but the position may still be in the middle
+            // of the buffer. Seek to the end of the buffer so that EOF is
+            // handled properly in the next invocation of webp_read_packet.
+            if ((ret = avio_seek(pb, pb->buf_end - pb->buf_ptr, SEEK_CUR) < 0))
+                return ret;
+            wdc->prev_end_position = avio_tell(pb);
+            wdc->remaining_size = 0;
+            return AVERROR_EOF;
+        }
+        if (ret < 0)
+            return ret;
+
+        // fallback if VP8X chunk was not present
+        if (headers) {
+            if (!headers->canvas_width && width > 0)
+                headers->canvas_width = width;
+            if (!headers->canvas_height && height > 0)
+                headers->canvas_height = height;
+        }
+
+        if (wdc->remaining_size < 8 + chunk_size)
+            return AVERROR_INVALIDDATA;
+        wdc->remaining_size -= 8 + chunk_size;
+
+        packet_end = avio_tell(pb);
+    }
+
+    if (wdc->remaining_size > 0 && avio_feof(pb)) {
+        // premature EOF, shorten the file size
+        WebPHeaders *tmp = webp_headers_lower_or_equal(wdc->webp_headers,
+                                                       wdc->num_webp_headers,
+                                                       avio_tell(pb));
+        tmp->webp_size -= wdc->remaining_size;
+        wdc->remaining_size = 0;
+    }
+
+flush:
+    if ((ret = avio_seek(pb, packet_start, SEEK_SET)) < 0)
+        return ret;
+
+    if ((ret = av_get_packet(pb, pkt, packet_end - packet_start)) < 0)
+        return ret;
+
+    wdc->prev_end_position = packet_end;
+
+    if (headers && headers->data) {
+        uint8_t *data = av_packet_new_side_data(pkt, AV_PKT_DATA_NEW_EXTRADATA,
+                                                headers->size);
+        if (!data)
+            return AVERROR(ENOMEM);
+        memcpy(data, headers->data, headers->size);
+
+        s->streams[0]->internal->need_context_update = 1;
+        s->streams[0]->codecpar->width  = headers->canvas_width;
+        s->streams[0]->codecpar->height = headers->canvas_height;
+
+        // copy the fields needed for the key frame detection
+        wdc->canvas_width  = headers->canvas_width;
+        wdc->canvas_height = headers->canvas_height;
+    }
+
+    key_frame = is_frame && is_key_frame(s, has_alpha, anmf_flags, width, height);
+    if (key_frame)
+        pkt->flags |= AV_PKT_FLAG_KEY;
+    else
+        pkt->flags &= ~AV_PKT_FLAG_KEY;
+
+    wdc->prev_width      = width;
+    wdc->prev_height     = height;
+    wdc->prev_anmf_flags = anmf_flags;
+    wdc->prev_key_frame  = key_frame;
+
+    pkt->stream_index = 0;
+    pkt->duration = is_frame ? wdc->delay : 0;
+    pkt->pts = pkt->dts = AV_NOPTS_VALUE;
+
+    if (is_frame && wdc->nb_frames == 1)
+        s->streams[0]->r_frame_rate = (AVRational) {1000, pkt->duration};
+
+    return ret;
+}
+
+static int webp_read_close(AVFormatContext *s)
+{
+    WebPDemuxContext *wdc = s->priv_data;
+
+    for (size_t i = 0; i < wdc->num_webp_headers; ++i)
+        av_freep(&wdc->webp_headers[i].data);
+    av_freep(&wdc->webp_headers);
+    wdc->num_webp_headers = 0;
+
+    return 0;
+}
+
+static const AVOption options[] = {
+    { "min_delay"     , "minimum valid delay between frames (in milliseconds)", offsetof(WebPDemuxContext, min_delay)    , AV_OPT_TYPE_INT, {.i64 = WEBP_MIN_DELAY}    , 0, 1000 * 60, AV_OPT_FLAG_DECODING_PARAM },
+    { "max_webp_delay", "maximum valid delay between frames (in milliseconds)", offsetof(WebPDemuxContext, max_delay)    , AV_OPT_TYPE_INT, {.i64 = 0xffffff}          , 0, 0xffffff , AV_OPT_FLAG_DECODING_PARAM },
+    { "default_delay" , "default delay between frames (in milliseconds)"      , offsetof(WebPDemuxContext, default_delay), AV_OPT_TYPE_INT, {.i64 = WEBP_DEFAULT_DELAY}, 0, 1000 * 60, AV_OPT_FLAG_DECODING_PARAM },
+    { "ignore_loop"   , "ignore loop setting"                                 , offsetof(WebPDemuxContext, ignore_loop)  , AV_OPT_TYPE_BOOL,{.i64 = 1}                 , 0, 1        , AV_OPT_FLAG_DECODING_PARAM },
+    { NULL },
+};
+
+static const AVClass demuxer_class = {
+    .class_name = "WebP demuxer",
+    .item_name  = av_default_item_name,
+    .option     = options,
+    .version    = LIBAVUTIL_VERSION_INT,
+    .category   = AV_CLASS_CATEGORY_DEMUXER,
+};
+
+AVInputFormat ff_webp_demuxer = {
+    .name           = "webp",
+    .long_name      = NULL_IF_CONFIG_SMALL("WebP image"),
+    .priv_data_size = sizeof(WebPDemuxContext),
+    .read_probe     = webp_probe,
+    .read_header    = webp_read_header,
+    .read_packet    = webp_read_packet,
+    .read_close     = webp_read_close,
+    .flags          = AVFMT_GENERIC_INDEX,
+    .priv_class     = &demuxer_class,
+};
diff --git a/tests/ref/fate/exif-image-webp b/tests/ref/fate/exif-image-webp
index d520ecd0db..4f97862ff4 100644
--- a/tests/ref/fate/exif-image-webp
+++ b/tests/ref/fate/exif-image-webp
@@ -8,10 +8,10 @@ pkt_dts=0
 pkt_dts_time=0.000000
 best_effort_timestamp=0
 best_effort_timestamp_time=0.000000
-pkt_duration=1
-pkt_duration_time=0.040000
-pkt_pos=0
-pkt_size=39276
+pkt_duration=100
+pkt_duration_time=0.100000
+pkt_pos=30
+pkt_size=39246
 width=400
 height=225
 pix_fmt=yuv420p
diff --git a/tests/ref/fate/webp-rgb-lena-lossless b/tests/ref/fate/webp-rgb-lena-lossless
index c00715a5e7..e784c501eb 100644
--- a/tests/ref/fate/webp-rgb-lena-lossless
+++ b/tests/ref/fate/webp-rgb-lena-lossless
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 128x128
diff --git a/tests/ref/fate/webp-rgb-lena-lossless-rgb24 b/tests/ref/fate/webp-rgb-lena-lossless-rgb24
index 7f8f550afe..395a01fa1b 100644
--- a/tests/ref/fate/webp-rgb-lena-lossless-rgb24
+++ b/tests/ref/fate/webp-rgb-lena-lossless-rgb24
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 128x128
diff --git a/tests/ref/fate/webp-rgb-lossless b/tests/ref/fate/webp-rgb-lossless
index 8dbdfd6887..1db3ce70f7 100644
--- a/tests/ref/fate/webp-rgb-lossless
+++ b/tests/ref/fate/webp-rgb-lossless
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 12x8
diff --git a/tests/ref/fate/webp-rgb-lossy-q80 b/tests/ref/fate/webp-rgb-lossy-q80
index f61d75ac13..cd43415b95 100644
--- a/tests/ref/fate/webp-rgb-lossy-q80
+++ b/tests/ref/fate/webp-rgb-lossy-q80
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 12x8
diff --git a/tests/ref/fate/webp-rgba-lossless b/tests/ref/fate/webp-rgba-lossless
index bb654ae442..2f763c6c46 100644
--- a/tests/ref/fate/webp-rgba-lossless
+++ b/tests/ref/fate/webp-rgba-lossless
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 12x8
diff --git a/tests/ref/fate/webp-rgba-lossy-q80 b/tests/ref/fate/webp-rgba-lossy-q80
index d2c2aa3fce..6b114f772e 100644
--- a/tests/ref/fate/webp-rgba-lossy-q80
+++ b/tests/ref/fate/webp-rgba-lossy-q80
@@ -1,4 +1,4 @@
-#tb 0: 1/25
+#tb 0: 1/10
 #media_type 0: video
 #codec_id 0: rawvideo
 #dimensions 0: 12x8