Message ID | 20230725085846.93593-4-thilo.borgmann@mail.de |
---|---|
State | New |
Headers | show |
Series | webp: add support for animated WebP decoding | expand |
Context | Check | Description |
---|---|---|
yinshiyou/make_loongarch64 | success | Make finished |
yinshiyou/make_fate_loongarch64 | success | Make fate finished |
andriy/make_x86 | success | Make finished |
andriy/make_fate_x86 | success | Make fate finished |
> + // Extremely simplified key frame detection: > + // - the first frame (containing headers) is marked as a key > frame > + // - other frames are marked as non-key frames Is there a more proper way of doing this? Looking briefly at the spec one wonders why they didn't just use regular VP* inter frames.. /Tomas
Am 25.07.23 um 14:24 schrieb Tomas Härdin: >> + // Extremely simplified key frame detection: >> + // - the first frame (containing headers) is marked as a key >> frame >> + // - other frames are marked as non-key frames > > Is there a more proper way of doing this? All frames (except the ANMF chunks) are INTRA, and all of them have a WEBP tag. Whereas all ANMF chunks are in the same WEBP chunk as their reference frame. So it should really be as simple as it is to mark all WEBP frames as key frames as the code does. What more dedicated do you have in mind? The logic as-is works with all samples I have, animated and not. Seems to also align well with their example file layouts. You have a more weird one? > Looking briefly at the spec > one wonders why they didn't just use regular VP* inter frames.. I assume the whole canvas idea could be more beneficial than vp8 inter - but otoh I don't know about vp8 compositing capabilities, if any.. -Thilo
On Tue, Jul 25, 2023 at 7:18 AM Thilo Borgmann <thilo.borgmann@mail.de> wrote: > > Am 25.07.23 um 14:24 schrieb Tomas Härdin: > >> + // Extremely simplified key frame detection: > >> + // - the first frame (containing headers) is marked as a key > >> frame > >> + // - other frames are marked as non-key frames > > > > Is there a more proper way of doing this? > > All frames (except the ANMF chunks) are INTRA, and all of them have a WEBP tag. > Whereas all ANMF chunks are in the same WEBP chunk as their reference frame. > So it should really be as simple as it is to mark all WEBP frames as key frames as the code does. > What more dedicated do you have in mind? > > The logic as-is works with all samples I have, animated and not. > Seems to also align well with their example file layouts. > You have a more weird one? > > > > Looking briefly at the spec > > one wonders why they didn't just use regular VP* inter frames.. > > I assume the whole canvas idea could be more beneficial than vp8 inter - but otoh I don't know about vp8 compositing capabilities, if any.. > This was more in alignment with gif and allowed for a simpler reference structure. WebP also supports lossless and mixing lossy/lossless so this method makes the behavior consistent between VP8 and VP8L.
tis 2023-07-25 klockan 16:18 +0200 skrev Thilo Borgmann: > Am 25.07.23 um 14:24 schrieb Tomas Härdin: > > > + // Extremely simplified key frame detection: > > > + // - the first frame (containing headers) is marked as a key > > > frame > > > + // - other frames are marked as non-key frames > > > > Is there a more proper way of doing this? > > All frames (except the ANMF chunks) are INTRA, and all of them have a > WEBP tag. > Whereas all ANMF chunks are in the same WEBP chunk as their reference > frame. > So it should really be as simple as it is to mark all WEBP frames as > key frames as the code does. > What more dedicated do you have in mind? Nah mostly just curious. It just feels so weird when VP8 intra already exists. Maybe I'm missing something. Browsers already support VP8 after all. > The logic as-is works with all samples I have, animated and not. > Seems to also align well with their example file layouts. > You have a more weird one? Nope /Tomas
On Wed, Jul 26, 2023 at 2:36 PM Tomas Härdin <git@haerdin.se> wrote: > > tis 2023-07-25 klockan 16:18 +0200 skrev Thilo Borgmann: > > Am 25.07.23 um 14:24 schrieb Tomas Härdin: > > > > + // Extremely simplified key frame detection: > > > > + // - the first frame (containing headers) is marked as a key > > > > frame > > > > + // - other frames are marked as non-key frames > > > > > > Is there a more proper way of doing this? > > > > All frames (except the ANMF chunks) are INTRA, and all of them have a > > WEBP tag. > > Whereas all ANMF chunks are in the same WEBP chunk as their reference > > frame. > > So it should really be as simple as it is to mark all WEBP frames as > > key frames as the code does. > > What more dedicated do you have in mind? > > Nah mostly just curious. It just feels so weird when VP8 intra already > exists. Maybe I'm missing something. Browsers already support VP8 after > all. > We wanted something lighter weight (memory, cpu) for an image format rather than going full blown video. Lossless also factored into this. > > The logic as-is works with all samples I have, animated and not. > > Seems to also align well with their example file layouts. > > You have a more weird one? > > Nope
diff --git a/libavcodec/webp_parser.c b/libavcodec/webp_parser.c index bd5f94dac5..da853bb1f5 100644 --- a/libavcodec/webp_parser.c +++ b/libavcodec/webp_parser.c @@ -25,13 +25,17 @@ #include "libavutil/bswap.h" #include "libavutil/common.h" +#include "libavutil/intreadwrite.h" #include "parser.h" typedef struct WebPParseContext { ParseContext pc; + int frame; + int first_frame; uint32_t fsize; - uint32_t remaining_size; + uint32_t remaining_file_size; + uint32_t remaining_tag_size; } WebPParseContext; static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx, @@ -41,62 +45,106 @@ static int webp_parse(AVCodecParserContext *s, AVCodecContext *avctx, WebPParseContext *ctx = s->priv_data; uint64_t state = ctx->pc.state64; int next = END_NOT_FOUND; - int i = 0; + int i, len; - *poutbuf = NULL; - *poutbuf_size = 0; - -restart: - if (ctx->pc.frame_start_found <= 8) { - for (; i < buf_size; i++) { + for (i = 0; i < buf_size;) { + if (ctx->remaining_tag_size) { + /* consuming tag */ + len = FFMIN(ctx->remaining_tag_size, buf_size - i); + i += len; + ctx->remaining_tag_size -= len; + ctx->remaining_file_size -= len; + } else { + /* scan for the next tag or file */ state = (state << 8) | buf[i]; - if (ctx->pc.frame_start_found == 0) { - if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) { - ctx->fsize = av_bswap32(state); - if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) { - ctx->pc.frame_start_found = 1; - ctx->fsize += 8; + i++; + + if (!ctx->remaining_file_size) { + /* scan for the next file */ + if (ctx->pc.frame_start_found == 4) { + ctx->pc.frame_start_found = 0; + if ((uint32_t) state == MKBETAG('W', 'E', 'B', 'P')) { + if (ctx->frame || i != 12) { + ctx->frame = 0; + next = i - 12; + state = 0; + ctx->pc.frame_start_found = 0; + break; + } + ctx->remaining_file_size = ctx->fsize - 4; + ctx->first_frame = 1; + continue; } } - } else if (ctx->pc.frame_start_found == 8) { - if ((state >> 32) != MKBETAG('W', 'E', 'B', 'P')) { + if (ctx->pc.frame_start_found == 0) { + if ((state >> 32) == MKBETAG('R', 'I', 'F', 'F')) { + ctx->fsize = av_bswap32(state); + if (ctx->fsize > 15 && ctx->fsize <= UINT32_MAX - 10) { + ctx->fsize += (ctx->fsize & 1); + ctx->pc.frame_start_found = 1; + } + } + } else + ctx->pc.frame_start_found++; + } else { + /* read the next tag */ + ctx->remaining_file_size--; + if (ctx->remaining_file_size == 0) { ctx->pc.frame_start_found = 0; continue; } ctx->pc.frame_start_found++; - ctx->remaining_size = ctx->fsize + i - 15; - if (ctx->pc.index + i > 15) { - next = i - 15; - state = 0; + if (ctx->pc.frame_start_found < 8) + continue; + + switch (state >> 32) { + case MKBETAG('A', 'N', 'M', 'F'): + case MKBETAG('V', 'P', '8', ' '): + case MKBETAG('V', 'P', '8', 'L'): + if (ctx->frame) { + ctx->frame = 0; + next = i - 8; + state = 0; + ctx->pc.frame_start_found = 0; + goto flush; + } + ctx->frame = 1; + break; + default: break; - } else { - ctx->pc.state64 = 0; - goto restart; } - } else if (ctx->pc.frame_start_found) - ctx->pc.frame_start_found++; - } - ctx->pc.state64 = state; - } else { - if (ctx->remaining_size) { - i = FFMIN(ctx->remaining_size, buf_size); - ctx->remaining_size -= i; - if (ctx->remaining_size) - goto flush; - ctx->pc.frame_start_found = 0; - goto restart; + ctx->remaining_tag_size = av_bswap32(state); + ctx->remaining_tag_size += ctx->remaining_tag_size & 1; + if (ctx->remaining_tag_size > ctx->remaining_file_size) { + /* this might be truncated remains before end of file */ + ctx->remaining_tag_size = ctx->remaining_file_size; + } + ctx->pc.frame_start_found = 0; + state = 0; + } } } - flush: - if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0) + ctx->pc.state64 = state; + + if (ff_combine_frame(&ctx->pc, next, &buf, &buf_size) < 0) { + *poutbuf = NULL; + *poutbuf_size = 0; return buf_size; + } - if (next != END_NOT_FOUND && next < 0) - ctx->pc.frame_start_found = FFMAX(ctx->pc.frame_start_found - i - 1, 0); - else - ctx->pc.frame_start_found = 0; + // Extremely simplified key frame detection: + // - the first frame (containing headers) is marked as a key frame + // - other frames are marked as non-key frames + if (ctx->first_frame) { + ctx->first_frame = 0; + s->pict_type = AV_PICTURE_TYPE_I; + s->key_frame = 1; + } else { + s->pict_type = AV_PICTURE_TYPE_P; + s->key_frame = 0; + } *poutbuf = buf; *poutbuf_size = buf_size;