diff mbox series

[FFmpeg-devel] Add picture attachment support to OGG Muxer

Message ID 20200606213754.53749-1-LRFLEW@aol.com
State New
Headers show
Series [FFmpeg-devel] Add picture attachment support to OGG Muxer | expand

Checks

Context Check Description
andriy/default pending
andriy/make_warn warning New warnings during build
andriy/make success Make finished
andriy/make_fate fail Make fate failed

Commit Message

LRFLEW June 6, 2020, 9:37 p.m. UTC
Signed-off-by: Lewis Fox <LRFLEW@aol.com>
---
This is an implementation for #4448, adding support for embedding attached cover art to OGG files. I had a need for the feature for a setup I was using, and had written a patch for it about half a year ago. I had forgotten to submit it back then, though. I remembered the patch recently, and spent some time rebasing against the latest master so I could finally put the patch on the mailing list.

Using the code for attachments on the FLAC container, I got an idea of what would be required to get it to work. I saw that, due to the attachment data being passed to the muxer as a stream, the actual stream data has to be buffered until the attachments are received and written. I figured I would have to implement this buffering for OGG to make attachments work. However, I saw that the OGG muxer already had OGGPageList, which did the same type of buffering (I think to make interlacing streams work). I ended up expanding the use of this class to make inserting attachment data work.

From what I've seen looking through the code for libavformat, OGG is rather unique in that attachments are stored on a per-stream basis. Everything else I've seen either stores attachments at a container scope or only supports one stream per container. As a result, FFmpeg was never designed around associating attachments to streams. This raised an interesting question of how to determine which stream to place the attachments in (when there are multiple streams). My solution was to add an optional metadata tag for attachments named "ogg_attach". When an attachment stream has this metadata tag, then the OGG muxer will place the attachment in the stream at that index. If the tag isn't present, or is set to an invalid value, it will default to the first stream in the output. To make the process symmetric, I added this metadata tag when demuxing an OGG container with attachments.

OGG FLAC turned out to be rather tricky to figure out. FLAC supports adding the attachments in the vorbis comments, but also supports directly adding the attachments to the stream (as done with the FLAC muxer). The existing demuxer only looked for attachements in OGG FLAC streams in the vorbis comments, so I investigated adding it there first. The challenge is that FLAC's vorbis comments is the only OGG-supported codec that adds a length value to the vorbis comments packet. All the other codecs just use the length indicated by the OGG structure. Since the size of the attachments isn't known until the muxer receives the attachment data, writing attachments to the vorbis comments would require adding extra complexity to update the length value. Additionally, the length written in the stream is only 24 bits. Since the image data is base64 encoded in vorbis comments, this means that having more than about 12.5 MB of attachments will overflow this value. This is a reasonably reachable size using things like lossless images (PNG), large image resolutions, and multiple images (front cover, back cover, etc.). I opted, for FLAC specifically, to add the attachments as their own metadata blocks, the same way as the FLAC muxer. Since the OGG demuxer didn't look for these metadata blocks, I also updated the demuxer to read them.

I tested all of the codecs supported by FFmpeg's OGG muxer: Vorbis, FLAC, Speex, Opus, Theora, and VP8 (the latter two with audio channels, with the attachment on either). I also tested having multiple images attached to a single file. Using VLC, I was able to see the cover art for Vorbis, Speex, and Opus, and was able to confirm that all the files except FLAC still played without problems. OGG FLAC appears to have some bugs in VLC, as even without this patch VLC skipped the first about 2 seconds of my 4 second test file. Attaching cover art also caused VLC to not parse the other metadata, such as title, album, and artist. However, due to VLC's buggy handling of OGG FLAC, it's hard to say that it is a problem with this patch. I confirmed that the audio did play normally in other players (including Firefox and Chrome), and FFmpeg can read the metadata without problems (even pre-patch). I tried other audio players, but OGG FLAC is poorly supported, and most that I tried had problems unrelated to the patch. The ones I found that worked well enough used libavcodec/libavformat, so it didn't make for a good test. At this point, I'm happy with how far I was able to take testing. I also looked at the resulting files in a hex editor, and I couldn't find any problems with any of the files I made with this patch.

 libavformat/flac_picture.c   | 112 +++++++++-
 libavformat/flac_picture.h   |   6 +-
 libavformat/flacdec.c        |   4 +-
 libavformat/flacenc.c        | 100 +--------
 libavformat/matroskadec.c    |   2 +-
 libavformat/matroskaenc.c    |   2 +-
 libavformat/oggdec.h         |   3 +-
 libavformat/oggenc.c         | 402 +++++++++++++++++++++++++++++------
 libavformat/oggparseflac.c   |   4 +
 libavformat/oggparsevorbis.c |   8 +-
 libavformat/vorbiscomment.c  |  32 +--
 libavformat/vorbiscomment.h  |   6 +-
 12 files changed, 503 insertions(+), 178 deletions(-)

Comments

Michael Niedermayer June 7, 2020, 8:17 p.m. UTC | #1
On Sat, Jun 06, 2020 at 04:37:54PM -0500, Lewis Fox wrote:
> Signed-off-by: Lewis Fox <LRFLEW@aol.com>
> ---
> This is an implementation for #4448, adding support for embedding attached cover art to OGG files. I had a need for the feature for a setup I was using, and had written a patch for it about half a year ago. I had forgotten to submit it back then, though. I remembered the patch recently, and spent some time rebasing against the latest master so I could finally put the patch on the mailing list.
> 
> Using the code for attachments on the FLAC container, I got an idea of what would be required to get it to work. I saw that, due to the attachment data being passed to the muxer as a stream, the actual stream data has to be buffered until the attachments are received and written. I figured I would have to implement this buffering for OGG to make attachments work. However, I saw that the OGG muxer already had OGGPageList, which did the same type of buffering (I think to make interlacing streams work). I ended up expanding the use of this class to make inserting attachment data work.
> 
> From what I've seen looking through the code for libavformat, OGG is rather unique in that attachments are stored on a per-stream basis. Everything else I've seen either stores attachments at a container scope or only supports one stream per container. As a result, FFmpeg was never designed around associating attachments to streams. This raised an interesting question of how to determine which stream to place the attachments in (when there are multiple streams). My solution was to add an optional metadata tag for attachments named "ogg_attach". When an attachment stream has this metadata tag, then the OGG muxer will place the attachment in the stream at that index. If the tag isn't present, or is set to an invalid value, it will default to the first stream in the output. To make the process symmetric, I added this metadata tag when demuxing an OGG container with attachments.
> 
> OGG FLAC turned out to be rather tricky to figure out. FLAC supports adding the attachments in the vorbis comments, but also supports directly adding the attachments to the stream (as done with the FLAC muxer). The existing demuxer only looked for attachements in OGG FLAC streams in the vorbis comments, so I investigated adding it there first. The challenge is that FLAC's vorbis comments is the only OGG-supported codec that adds a length value to the vorbis comments packet. All the other codecs just use the length indicated by the OGG structure. Since the size of the attachments isn't known until the muxer receives the attachment data, writing attachments to the vorbis comments would require adding extra complexity to update the length value. Additionally, the length written in the stream is only 24 bits. Since the image data is base64 encoded in vorbis comments, this means that having more than about 12.5 MB of attachments will overflow this value. This is a reasonably reachable size using things like lossless images (PNG), large image resolutions, and multiple images (front cover, back cover, etc.). I opted, for FLAC specifically, to add the attachments as their own metadata blocks, the same way as the FLAC muxer. Since the OGG demuxer didn't look for these metadata blocks, I also updated the demuxer to read them.
> 
> I tested all of the codecs supported by FFmpeg's OGG muxer: Vorbis, FLAC, Speex, Opus, Theora, and VP8 (the latter two with audio channels, with the attachment on either). I also tested having multiple images attached to a single file. Using VLC, I was able to see the cover art for Vorbis, Speex, and Opus, and was able to confirm that all the files except FLAC still played without problems. OGG FLAC appears to have some bugs in VLC, as even without this patch VLC skipped the first about 2 seconds of my 4 second test file. Attaching cover art also caused VLC to not parse the other metadata, such as title, album, and artist. However, due to VLC's buggy handling of OGG FLAC, it's hard to say that it is a problem with this patch. I confirmed that the audio did play normally in other players (including Firefox and Chrome), and FFmpeg can read the metadata without problems (even pre-patch). I tried other audio players, but OGG FLAC is poorly supported, and most that I tried had problems unrelated to the patch. The ones I found that worked well enough used libavcodec/libavformat, so it didn't make for a good test. At this point, I'm happy with how far I was able to take testing. I also looked at the resulting files in a hex editor, and I couldn't find any problems with any of the files I made with this patch.
> 
>  libavformat/flac_picture.c   | 112 +++++++++-
>  libavformat/flac_picture.h   |   6 +-
>  libavformat/flacdec.c        |   4 +-
>  libavformat/flacenc.c        | 100 +--------
>  libavformat/matroskadec.c    |   2 +-
>  libavformat/matroskaenc.c    |   2 +-
>  libavformat/oggdec.h         |   3 +-
>  libavformat/oggenc.c         | 402 +++++++++++++++++++++++++++++------
>  libavformat/oggparseflac.c   |   4 +
>  libavformat/oggparsevorbis.c |   8 +-
>  libavformat/vorbiscomment.c  |  32 +--
>  libavformat/vorbiscomment.h  |   6 +-
>  12 files changed, 503 insertions(+), 178 deletions(-)

seems to break fate

--- ./tests/ref/fate/limited_input_seek-copyts	2020-05-28 14:46:23.971547831 +0200
+++ tests/data/fate/limited_input_seek-copyts	2020-06-07 22:10:38.303050886 +0200
@@ -1 +1 @@
-ffe8a674bdf38e4f650f91963debc654
+aae5603508b268cbb456b633b84a48af
Test limited_input_seek-copyts failed. Look at tests/data/fate/limited_input_seek-copyts.err for details.
tests/Makefile:255: recipe for target 'fate-limited_input_seek-copyts' failed
make: *** [fate-limited_input_seek-copyts] Error 1


[...]
Carl Eugen Hoyos June 7, 2020, 8:31 p.m. UTC | #2
Am Sa., 6. Juni 2020 um 23:40 Uhr schrieb Lewis Fox
<LRFLEW-at-aol.com@ffmpeg.org>:

>  libavformat/flac_picture.c   | 112 +++++++++-
>  libavformat/flac_picture.h   |   6 +-
>  libavformat/flacdec.c        |   4 +-
>  libavformat/flacenc.c        | 100 +--------

Moving flac_write_picture() should be a separate patch.

Carl Eugen
LRFLEW June 7, 2020, 8:53 p.m. UTC | #3
>seems to break fate

>--- ./tests/ref/fate/limited_input_seek-copyts    2020-05-28 14:46:23.971547831 +0200
>+++ tests/data/fate/limited_input_seek-copyts    2020-06-07 22:10:38.303050886 +0200
>@@ -1 +1 @@
>-ffe8a674bdf38e4f650f91963debc654
>+aae5603508b268cbb456b633b84a48af
>Test limited_input_seek-copyts failed. Look at tests/data/fate/limited_input_seek-copyts.err for details.
>tests/Makefile:255: recipe for target 'fate-limited_input_seek-copyts' failed
>make: *** [fate-limited_input_seek-copyts] Error 1
I had noticed this on Patchwork. It shows errors for "limited_input_seek" and "limited_input_seek-copyts". https://patchwork.ffmpeg.org/check/11651/
However, I've run "make fate" multiple times locally now and I it always passed for me. Looking through the output, it appears that these test cases aren't being run when I run fate myself. Do I need to enable a configuration or use a specific platform to get this test to run? I tried looking through my local files to try and figure out what the test is doing, but the only files I can find are the "tests/ref/fate" ones, which don't tell me anything about what's being tested. What files actually define these tests? I am willing to look into this, but I can't figure out where to even start.
Andreas Rheinhardt June 7, 2020, 8:59 p.m. UTC | #4
LRFLEW:
>> seems to break fate
> 
>> --- ./tests/ref/fate/limited_input_seek-copyts    2020-05-28 14:46:23.971547831 +0200
>> +++ tests/data/fate/limited_input_seek-copyts    2020-06-07 22:10:38.303050886 +0200
>> @@ -1 +1 @@
>> -ffe8a674bdf38e4f650f91963debc654
>> +aae5603508b268cbb456b633b84a48af
>> Test limited_input_seek-copyts failed. Look at tests/data/fate/limited_input_seek-copyts.err for details.
>> tests/Makefile:255: recipe for target 'fate-limited_input_seek-copyts' failed
>> make: *** [fate-limited_input_seek-copyts] Error 1
> I had noticed this on Patchwork. It shows errors for "limited_input_seek" and "limited_input_seek-copyts". https://patchwork.ffmpeg.org/check/11651/
> However, I've run "make fate" multiple times locally now and I it always passed for me. Looking through the output, it appears that these test cases aren't being run when I run fate myself. Do I need to enable a configuration or use a specific platform to get this test to run? I tried looking through my local files to try and figure out what the test is doing, but the only files I can find are the "tests/ref/fate" ones, which don't tell me anything about what's being tested. What files actually define these tests? I am willing to look into this, but I can't figure out where to even start.

My guess: You don't have SAMPLES set (and maybe haven't downloaded the
fate-suite), so that tests that depend on samples aren't run.

This test is in tests/fate/ffmpeg.mak.

- Andreas
Carl Eugen Hoyos June 7, 2020, 9:06 p.m. UTC | #5
Am So., 7. Juni 2020 um 22:54 Uhr schrieb LRFLEW <lrflew-at-aol.com@ffmpeg.org>:

> I had noticed this on Patchwork. It shows errors for "limited_input_seek"
> and "limited_input_seek-copyts". https://patchwork.ffmpeg.org/check/11651/
> However, I've run "make fate" multiple times locally now and it always
> passed for me. Looking through the output, it appears that these test
> cases aren't being run when I run fate myself.

$ make SAMPLES=fate-suite fate-rsync
$ make SAMPLES=fate-suite fate

Carl Eugen
Andreas Rheinhardt June 7, 2020, 10:28 p.m. UTC | #6
Lewis Fox:
> Signed-off-by: Lewis Fox <LRFLEW@aol.com>
> ---
> This is an implementation for #4448, adding support for embedding attached cover art to OGG files. I had a need for the feature for a setup I was using, and had written a patch for it about half a year ago. I had forgotten to submit it back then, though. I remembered the patch recently, and spent some time rebasing against the latest master so I could finally put the patch on the mailing list.
> 
> Using the code for attachments on the FLAC container, I got an idea of what would be required to get it to work. I saw that, due to the attachment data being passed to the muxer as a stream, the actual stream data has to be buffered until the attachments are received and written. I figured I would have to implement this buffering for OGG to make attachments work. However, I saw that the OGG muxer already had OGGPageList, which did the same type of buffering (I think to make interlacing streams work). I ended up expanding the use of this class to make inserting attachment data work.
> 
> From what I've seen looking through the code for libavformat, OGG is rather unique in that attachments are stored on a per-stream basis. Everything else I've seen either stores attachments at a container scope or only supports one stream per container. As a result, FFmpeg was never designed around associating attachments to streams. This raised an interesting question of how to determine which stream to place the attachments in (when there are multiple streams). My solution was to add an optional metadata tag for attachments named "ogg_attach". When an attachment stream has this metadata tag, then the OGG muxer will place the attachment in the stream at that index. If the tag isn't present, or is set to an invalid value, it will default to the first stream in the output. To make the process symmetric, I added this metadata tag when demuxing an OGG container with attachments.
> 
> OGG FLAC turned out to be rather tricky to figure out. FLAC supports adding the attachments in the vorbis comments, but also supports directly adding the attachments to the stream (as done with the FLAC muxer). The existing demuxer only looked for attachements in OGG FLAC streams in the vorbis comments, so I investigated adding it there first. The challenge is that FLAC's vorbis comments is the only OGG-supported codec that adds a length value to the vorbis comments packet. All the other codecs just use the length indicated by the OGG structure. Since the size of the attachments isn't known until the muxer receives the attachment data, writing attachments to the vorbis comments would require adding extra complexity to update the length value. Additionally, the length written in the stream is only 24 bits. Since the image data is base64 encoded in vorbis comments, this means that having more than about 12.5 MB of attachments will overflow this value. This is a reasonably reachable size using things like lossless images (PNG), large image resolutions, and multiple images (front cover, back cover, etc.). I opted, for FLAC specifically, to add the attachments as their own metadata blocks, the same way as the FLAC muxer. Since the OGG demuxer didn't look for these metadata blocks, I also updated the demuxer to read them.
> 
> I tested all of the codecs supported by FFmpeg's OGG muxer: Vorbis, FLAC, Speex, Opus, Theora, and VP8 (the latter two with audio channels, with the attachment on either). I also tested having multiple images attached to a single file. Using VLC, I was able to see the cover art for Vorbis, Speex, and Opus, and was able to confirm that all the files except FLAC still played without problems. OGG FLAC appears to have some bugs in VLC, as even without this patch VLC skipped the first about 2 seconds of my 4 second test file. Attaching cover art also caused VLC to not parse the other metadata, such as title, album, and artist. However, due to VLC's buggy handling of OGG FLAC, it's hard to say that it is a problem with this patch. I confirmed that the audio did play normally in other players (including Firefox and Chrome), and FFmpeg can read the metadata without problems (even pre-patch). I tried other audio players, but OGG FLAC is poorly supported, and most that I tried had problems unrelated to the patch. The ones I found that worked well enough used libavcodec/libavformat, so it didn't make for a good test. At this point, I'm happy with how far I was able to take testing. I also looked at the resulting files in a hex editor, and I couldn't find any problems with any of the files I made with this patch.
> 
>  libavformat/flac_picture.c   | 112 +++++++++-
>  libavformat/flac_picture.h   |   6 +-
>  libavformat/flacdec.c        |   4 +-
>  libavformat/flacenc.c        | 100 +--------
>  libavformat/matroskadec.c    |   2 +-
>  libavformat/matroskaenc.c    |   2 +-
>  libavformat/oggdec.h         |   3 +-
>  libavformat/oggenc.c         | 402 +++++++++++++++++++++++++++++------
>  libavformat/oggparseflac.c   |   4 +
>  libavformat/oggparsevorbis.c |   8 +-
>  libavformat/vorbiscomment.c  |  32 +--
>  libavformat/vorbiscomment.h  |   6 +-
>  12 files changed, 503 insertions(+), 178 deletions(-)
> 
> diff --git a/libavformat/flac_picture.c b/libavformat/flac_picture.c
> index 53e24b28b7..73edc7e238 100644
> --- a/libavformat/flac_picture.c
> +++ b/libavformat/flac_picture.c
> @@ -20,6 +20,7 @@
>   */
>  
>  #include "libavutil/intreadwrite.h"
> +#include "libavutil/pixdesc.h"
>  #include "libavcodec/bytestream.h"
>  #include "libavcodec/png.h"
>  #include "avformat.h"
> @@ -29,7 +30,8 @@
>  
>  #define MAX_TRUNC_PICTURE_SIZE (500 * 1024 * 1024)
>  
> -int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int truncate_workaround)
> +int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size,
> +                          int ogg_index, int truncate_workaround)
>  {
>      const CodecMime *mime = ff_id3v2_mime_tags;
>      enum AVCodecID id = AV_CODEC_ID_NONE;
> @@ -180,6 +182,8 @@ int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int tr
>      av_dict_set(&st->metadata, "comment", ff_id3v2_picture_types[type], 0);
>      if (desc)
>          av_dict_set(&st->metadata, "title", desc, AV_DICT_DONT_STRDUP_VAL);
> +    if (ogg_index >= 0)
> +        av_dict_set_int(&st->metadata, "ogg_attach", ogg_index, 0);
>  
>      return 0;
>  
> @@ -189,3 +193,109 @@ fail:
>  
>      return ret;
>  }
> +
> +int ff_flac_picture_length(struct AVFormatContext *s, AVPacket *pkt) {

You can add const to both parameters. And remove the "struct".
Furthermore, the opening "{" should be on a separate line.

> +    const CodecMime *mime = ff_id3v2_mime_tags;
> +    AVDictionaryEntry *e;

Add const here as well.

> +    const char *mimetype = NULL, *desc = "";
> +    const AVStream *st = s->streams[pkt->stream_index];
> +    int mimelen, desclen;
> +
> +    if (!pkt->data)
> +        return 0;
> +
> +    /* get the mimetype */
> +    while (mime->id != AV_CODEC_ID_NONE) {
> +        if (mime->id == st->codecpar->codec_id) {
> +            mimetype = mime->str;
> +            break;
> +        }
> +        mime++;
> +    }
> +    if (!mimetype)
> +        return 0;
> +    mimelen = strlen(mimetype);
> +
> +    /* get the description */
> +    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
> +        desc = e->value;

The only thing you do with desc is use it in the next line. You could
actually initialize desclen to 0 and set desclen = strlen(e->value) if
you the if is true.

> +    desclen = strlen(desc);
> +
Furthermore, these are mismatching types. The assignment here might not
be value-preserving.


> +    return 4 + 4 + mimelen + 4 + desclen + 4 + 4 + 4 + 4 + 4 + pkt->size;

Missing overflow check. In fact, I wonder whether int is the right
return value.

> +}
> +
> +int ff_flac_write_picture(struct AVFormatContext *s, AVIOContext *pb,
> +                          AVPacket *pkt, unsigned *attached_types)
> +{
> +    const AVPixFmtDescriptor *pixdesc;
> +    const CodecMime *mime = ff_id3v2_mime_tags;
> +    AVDictionaryEntry *e;
> +    const char *mimetype = NULL, *desc = "";
> +    const AVStream *st = s->streams[pkt->stream_index];
> +    int i, mimelen, desclen, type = 0;
> +
> +    if (!pkt->data)
> +        return 0;
> +
> +    while (mime->id != AV_CODEC_ID_NONE) {
> +        if (mime->id == st->codecpar->codec_id) {
> +            mimetype = mime->str;
> +            break;
> +        }
> +        mime++;
> +    }
> +    if (!mimetype) {
> +        av_log(s, AV_LOG_ERROR, "No mimetype is known for stream %d, cannot "
> +               "write an attached picture.\n", st->index);
> +        return AVERROR(EINVAL);
> +    }
> +    mimelen = strlen(mimetype);
> +
> +    /* get the picture type */
> +    e = av_dict_get(st->metadata, "comment", NULL, 0);
> +    for (i = 0; e && i < FF_ARRAY_ELEMS(ff_id3v2_picture_types); i++) {
> +        if (!av_strcasecmp(e->value, ff_id3v2_picture_types[i])) {
> +            type = i;
> +            break;
> +        }
> +    }
> +
> +    if ((*attached_types & (1 << type)) & 0x6) {
> +        av_log(s, AV_LOG_ERROR, "Duplicate attachment for type '%s'\n", ff_id3v2_picture_types[type]);
> +        return AVERROR(EINVAL);
> +    }
> +
> +    if (type == 1 && (st->codecpar->codec_id != AV_CODEC_ID_PNG ||
> +                      st->codecpar->width != 32 ||
> +                      st->codecpar->height != 32)) {
> +        av_log(s, AV_LOG_ERROR, "File icon attachment must be a 32x32 PNG");
> +        return AVERROR(EINVAL);
> +    }
> +
> +    *attached_types |= (1 << type);
> +
> +    /* get the description */
> +    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
> +        desc = e->value;
> +    desclen = strlen(desc);
> +
> +    avio_wb32(pb, type);
> +
> +    avio_wb32(pb, mimelen);
> +    avio_write(pb, mimetype, mimelen);
> +
> +    avio_wb32(pb, desclen);
> +    avio_write(pb, desc, desclen);
> +
> +    avio_wb32(pb, st->codecpar->width);
> +    avio_wb32(pb, st->codecpar->height);
> +    if ((pixdesc = av_pix_fmt_desc_get(st->codecpar->format)))
> +        avio_wb32(pb, av_get_bits_per_pixel(pixdesc));
> +    else
> +        avio_wb32(pb, 0);
> +    avio_wb32(pb, 0);
> +
> +    avio_wb32(pb, pkt->size);
> +    avio_write(pb, pkt->data, pkt->size);
> +    return 0;
> +}
> diff --git a/libavformat/flac_picture.h b/libavformat/flac_picture.h
> index 61fd0c8806..9fc18b05ee 100644
> --- a/libavformat/flac_picture.h
> +++ b/libavformat/flac_picture.h
> @@ -26,6 +26,10 @@
>  
>  #define RETURN_ERROR(code) do { ret = (code); goto fail; } while (0)
>  
> -int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int truncate_workaround);
> +int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size,
> +                          int ogg_index, int truncate_workaround);
> +int ff_flac_picture_length(struct AVFormatContext *s, AVPacket *pkt);
> +int ff_flac_write_picture(struct AVFormatContext *s, AVIOContext *pb,
> +                          AVPacket *pkt, unsigned *attached_types);
>  
>  #endif /* AVFORMAT_FLAC_PICTURE_H */
> diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
> index 79c05f14bf..4e9676a71d 100644
> --- a/libavformat/flacdec.c
> +++ b/libavformat/flacdec.c
> @@ -146,7 +146,7 @@ static int flac_read_header(AVFormatContext *s)
>              }
>              av_freep(&buffer);
>          } else if (metadata_type == FLAC_METADATA_TYPE_PICTURE) {
> -            ret = ff_flac_parse_picture(s, buffer, metadata_size, 1);
> +            ret = ff_flac_parse_picture(s, buffer, metadata_size, 1, -1);
>              av_freep(&buffer);
>              if (ret < 0) {
>                  av_log(s, AV_LOG_ERROR, "Error parsing attached picture.\n");
> @@ -177,7 +177,7 @@ static int flac_read_header(AVFormatContext *s)
>              if (metadata_type == FLAC_METADATA_TYPE_VORBIS_COMMENT) {
>                  AVDictionaryEntry *chmask;
>  
> -                ret = ff_vorbis_comment(s, &s->metadata, buffer, metadata_size, 1);
> +                ret = ff_vorbis_comment(s, &s->metadata, buffer, metadata_size, -1);
>                  if (ret < 0) {
>                      av_log(s, AV_LOG_WARNING, "error parsing VorbisComment metadata\n");
>                  } else if (ret > 0) {
> diff --git a/libavformat/flacenc.c b/libavformat/flacenc.c
> index b947a3b067..db8972f56f 100644
> --- a/libavformat/flacenc.c
> +++ b/libavformat/flacenc.c
> @@ -21,11 +21,11 @@
>  
>  #include "libavutil/channel_layout.h"
>  #include "libavutil/opt.h"
> -#include "libavutil/pixdesc.h"
>  #include "libavcodec/flac.h"
>  #include "avformat.h"
>  #include "avio_internal.h"
>  #include "flacenc.h"
> +#include "flac_picture.h"
>  #include "id3v2.h"
>  #include "internal.h"
>  #include "vorbiscomment.h"
> @@ -70,100 +70,15 @@ static int flac_write_block_comment(AVIOContext *pb, AVDictionary **m,
>  
>      avio_w8(pb, last_block ? 0x84 : 0x04);
>      avio_wb24(pb, len);
> -    ff_vorbiscomment_write(pb, *m, vendor, NULL, 0);
> +    ff_vorbiscomment_write(pb, *m, vendor, NULL, 0, 0);
>  
>      return 0;
>  }
>  
> -static int flac_write_picture(struct AVFormatContext *s, AVPacket *pkt)
> -{
> -    FlacMuxerContext *c = s->priv_data;
> -    AVIOContext *pb = s->pb;
> -    const AVPixFmtDescriptor *pixdesc;
> -    const CodecMime *mime = ff_id3v2_mime_tags;
> -    AVDictionaryEntry *e;
> -    const char *mimetype = NULL, *desc = "";
> -    const AVStream *st = s->streams[pkt->stream_index];
> -    int i, mimelen, desclen, type = 0, blocklen;
> -
> -    if (!pkt->data)
> -        return 0;
> -
> -    while (mime->id != AV_CODEC_ID_NONE) {
> -        if (mime->id == st->codecpar->codec_id) {
> -            mimetype = mime->str;
> -            break;
> -        }
> -        mime++;
> -    }
> -    if (!mimetype) {
> -        av_log(s, AV_LOG_ERROR, "No mimetype is known for stream %d, cannot "
> -               "write an attached picture.\n", st->index);
> -        return AVERROR(EINVAL);
> -    }
> -    mimelen = strlen(mimetype);
> -
> -    /* get the picture type */
> -    e = av_dict_get(st->metadata, "comment", NULL, 0);
> -    for (i = 0; e && i < FF_ARRAY_ELEMS(ff_id3v2_picture_types); i++) {
> -        if (!av_strcasecmp(e->value, ff_id3v2_picture_types[i])) {
> -            type = i;
> -            break;
> -        }
> -    }
> -
> -    if ((c->attached_types & (1 << type)) & 0x6) {
> -        av_log(s, AV_LOG_ERROR, "Duplicate attachment for type '%s'\n", ff_id3v2_picture_types[type]);
> -        return AVERROR(EINVAL);
> -    }
> -
> -    if (type == 1 && (st->codecpar->codec_id != AV_CODEC_ID_PNG ||
> -                      st->codecpar->width != 32 ||
> -                      st->codecpar->height != 32)) {
> -        av_log(s, AV_LOG_ERROR, "File icon attachment must be a 32x32 PNG");
> -        return AVERROR(EINVAL);
> -    }
> -
> -    c->attached_types |= (1 << type);
> -
> -    /* get the description */
> -    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
> -        desc = e->value;
> -    desclen = strlen(desc);
> -
> -    blocklen = 4 + 4 + mimelen + 4 + desclen + 4 + 4 + 4 + 4 + 4 + pkt->size;
> -    if (blocklen >= 1<<24) {
> -        av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", blocklen, 1<<24);
> -        return AVERROR(EINVAL);
> -    }
> -
> -    avio_w8(pb, 0x06);
> -    avio_wb24(pb, blocklen);
> -
> -    avio_wb32(pb, type);
> -
> -    avio_wb32(pb, mimelen);
> -    avio_write(pb, mimetype, mimelen);
> -
> -    avio_wb32(pb, desclen);
> -    avio_write(pb, desc, desclen);
> -
> -    avio_wb32(pb, st->codecpar->width);
> -    avio_wb32(pb, st->codecpar->height);
> -    if ((pixdesc = av_pix_fmt_desc_get(st->codecpar->format)))
> -        avio_wb32(pb, av_get_bits_per_pixel(pixdesc));
> -    else
> -        avio_wb32(pb, 0);
> -    avio_wb32(pb, 0);
> -
> -    avio_wb32(pb, pkt->size);
> -    avio_write(pb, pkt->data, pkt->size);
> -    return 0;
> -}
> -
>  static int flac_finish_header(struct AVFormatContext *s)
>  {
>      int i, ret, padding = s->metadata_header_padding;
> +    FlacMuxerContext *pd = s->priv_data;
>      if (padding < 0)
>          padding = 8192;
>      /* The FLAC specification states that 24 bits are used to represent the
> @@ -175,7 +90,14 @@ static int flac_finish_header(struct AVFormatContext *s)
>          AVPacket *pkt = st->priv_data;
>          if (!pkt)
>              continue;
> -        ret = flac_write_picture(s, pkt);
> +        int picture_len = ff_flac_picture_length(s, pkt);
> +        if (picture_len >= 1<<24) {
> +            av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", picture_len, 1<<24);
> +            return AVERROR(EINVAL);
> +        }
> +        avio_w8(s->pb, 0x06);
> +        avio_wb24(s->pb, picture_len);
> +        ret = ff_flac_write_picture(s, s->pb, pkt, &pd->attached_types);
>          av_packet_unref(pkt);
>          if (ret < 0 && (s->error_recognition & AV_EF_EXPLODE))
>              return ret;
> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> index bb3a126c29..fcbf3e0497 100644
> --- a/libavformat/matroskadec.c
> +++ b/libavformat/matroskadec.c
> @@ -2005,7 +2005,7 @@ static int matroska_parse_flac(AVFormatContext *s,
>              AVDictionary *dict = NULL;
>              AVDictionaryEntry *chmask;
>  
> -            ff_vorbis_comment(s, &dict, p, block_size, 0);
> +            ff_vorbis_comment(s, &dict, p, block_size, -2);
>              chmask = av_dict_get(dict, "WAVEFORMATEXTENSIBLE_CHANNEL_MASK", NULL, 0);
>              if (chmask) {
>                  uint64_t mask = strtol(chmask->value, NULL, 0);
> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> index 1c1ea71f59..00bd769962 100644
> --- a/libavformat/matroskaenc.c
> +++ b/libavformat/matroskaenc.c
> @@ -666,7 +666,7 @@ static int put_flac_codecpriv(AVFormatContext *s, AVIOContext *pb,
>          avio_w8(pb, 0x84);
>          avio_wb24(pb, len);
>  
> -        ff_vorbiscomment_write(pb, dict, vendor, NULL, 0);
> +        ff_vorbiscomment_write(pb, dict, vendor, NULL, 0, 0);
>  
>          av_dict_free(&dict);
>      }
> diff --git a/libavformat/oggdec.h b/libavformat/oggdec.h
> index 629a1d6262..58d7c86a4e 100644
> --- a/libavformat/oggdec.h
> +++ b/libavformat/oggdec.h
> @@ -131,7 +131,8 @@ extern const struct ogg_codec ff_vorbis_codec;
>  extern const struct ogg_codec ff_vp8_codec;
>  
>  int ff_vorbis_comment(AVFormatContext *ms, AVDictionary **m,
> -                      const uint8_t *buf, int size, int parse_picture);
> +                      const uint8_t *buf, int size,
> +                      int parse_picture_index);
>  
>  int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
>                               const uint8_t *buf, int size);
> diff --git a/libavformat/oggenc.c b/libavformat/oggenc.c
> index f5032759a6..4223690956 100644
> --- a/libavformat/oggenc.c
> +++ b/libavformat/oggenc.c
> @@ -21,6 +21,7 @@
>  
>  #include <stdint.h>
>  
> +#include "libavutil/base64.h"
>  #include "libavutil/crc.h"
>  #include "libavutil/mathematics.h"
>  #include "libavutil/opt.h"
> @@ -30,6 +31,7 @@
>  #include "libavcodec/flac.h"
>  #include "avformat.h"
>  #include "avio_internal.h"
> +#include "flac_picture.h"
>  #include "internal.h"
>  #include "vorbiscomment.h"
>  
> @@ -46,6 +48,11 @@ typedef struct OGGPage {
>      uint16_t size;
>  } OGGPage;
>  
> +typedef struct OGGPageList {
> +    OGGPage page;
> +    struct OGGPageList *next;
> +} OGGPageList;
> +
>  typedef struct OGGStreamContext {
>      unsigned page_counter;
>      uint8_t *header[3];
> @@ -61,12 +68,16 @@ typedef struct OGGStreamContext {
>      OGGPage page; ///< current page
>      unsigned serial_num; ///< serial number
>      int64_t last_granule; ///< last packet granule
> +    /* for embeded pictures */
> +    OGGPageList *comments;
> +    unsigned int attach_count;
> +    unsigned attached_types;
>  } OGGStreamContext;
>  
> -typedef struct OGGPageList {
> -    OGGPage page;
> -    struct OGGPageList *next;
> -} OGGPageList;
> +typedef struct OGGAttachContext {
> +    int attach_to;
> +    unsigned int count;
> +} OGGAttachContext;
>  
>  typedef struct OGGContext {
>      const AVClass *class;
> @@ -74,6 +85,7 @@ typedef struct OGGContext {
>      int pref_size; ///< preferred page size (0 => fill all segments)
>      int64_t pref_duration;      ///< preferred page duration (0 => fill all segments)
>      int serial_offset;
> +    unsigned int attach_count;
>  } OGGContext;
>  
>  #define OFFSET(x) offsetof(OGGContext, x)
> @@ -169,7 +181,7 @@ static int ogg_reset_cur_page(OGGStreamContext *oggstream)
>      return 0;
>  }
>  
> -static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream)
> +static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream, OGGPageList **page)
>  {
>      OGGContext *ogg = s->priv_data;
>      OGGPageList **p = &ogg->page_list;
> @@ -190,6 +202,7 @@ static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream)
>      }
>      l->next = *p;
>      *p = l;
> +    if (page) *page = l;
>  
>      return 0;
>  }
> @@ -214,14 +227,14 @@ static int ogg_buffer_data(AVFormatContext *s, AVStream *st,
>           ogg_granule_to_timestamp(oggstream, oggstream->last_granule) + 1 ||
>           ogg_key_granule(oggstream, granule))) {
>          if (oggstream->page.granule != -1)
> -            ogg_buffer_page(s, oggstream);
> +            ogg_buffer_page(s, oggstream, NULL);
>          flush = 1;
>      }
>  
>      // avoid a continued page
>      if (!header && oggstream->page.size > 0 &&
>          MAX_PAGE_SIZE - oggstream->page.size < size) {
> -        ogg_buffer_page(s, oggstream);
> +        ogg_buffer_page(s, oggstream, NULL);
>      }
>  
>      for (i = 0; i < total_segments; ) {
> @@ -255,34 +268,87 @@ static int ogg_buffer_data(AVFormatContext *s, AVStream *st,
>                                           st->time_base, AV_TIME_BASE_Q);
>  
>              if (page->segments_count == 255) {
> -                ogg_buffer_page(s, oggstream);
> +                ogg_buffer_page(s, oggstream, NULL);
>              } else if (!header) {
>                  if ((ogg->pref_size     > 0 && page->size   >= ogg->pref_size) ||
>                      (ogg->pref_duration > 0 && next - start >= ogg->pref_duration)) {
> -                    ogg_buffer_page(s, oggstream);
> +                    ogg_buffer_page(s, oggstream, NULL);
>                  }
>              }
>          }
>      }
>  
>      if (flush && oggstream->page.granule != -1)
> -        ogg_buffer_page(s, oggstream);
> +        ogg_buffer_page(s, oggstream, NULL);
> +
> +    return 0;
> +}
> +
> +static int ogg_append_page(AVFormatContext *s, OGGPageList **page_list,
> +                           uint8_t *data, unsigned size, int new_packet)
> +{
> +    int seg_size = 255, started = 0;
> +    OGGPage *p = &(*page_list)->page;
> +    int64_t granule = p->granule;
> +
> +    if (!new_packet) {
> +        p->granule = -1;
> +        int append = FFMIN(255 - p->segments[p->segments_count - 1], size);
> +        memcpy(p->data + p->size, data, append);
> +        p->size += append;
> +        seg_size = (p->segments[p->segments_count - 1] += append);
> +        data += append;
> +        size -= append;
> +        started = 1;
> +    }
>  
> +    while (size > 0 || seg_size == 255) {
> +        if (p->segments_count == 255) {
> +            OGGPageList *new_page = av_malloc(sizeof(OGGPageList));
> +            if (!new_page)
> +                return AVERROR(ENOMEM);
> +            new_page->page = (OGGPage) {
> +                .start_granule = p->start_granule,
> +                .granule = -1,
> +                .stream_index = p->stream_index,
> +                .flags = started,
> +                .segments_count = 0,
> +                .size = 0,
> +            };
> +            new_page->next = (*page_list)->next;
> +            (*page_list)->next = new_page;
> +            (*page_list) = new_page;
> +            p = &new_page->page;
> +            ((OGGStreamContext *) s->streams[p->stream_index]->priv_data)->page_count++;
> +        }
> +
> +        seg_size = FFMIN(255, size);
> +        memcpy(p->data + p->size, data, seg_size);
> +        p->size += seg_size;
> +        p->segments[p->segments_count++] = seg_size;
> +        data += seg_size;
> +        size -= seg_size;
> +        started = 1;
> +    }
> +
> +    p->granule = granule;
>      return 0;
>  }
>  
>  static uint8_t *ogg_write_vorbiscomment(int64_t offset, int bitexact,
>                                          int *header_len, AVDictionary **m, int framing_bit,
> -                                        AVChapter **chapters, unsigned int nb_chapters)
> +                                        AVChapter **chapters, unsigned int nb_chapters,
> +                                        unsigned int nb_attach)
>  {
>      const char *vendor = bitexact ? "ffmpeg" : LIBAVFORMAT_IDENT;
>      AVIOContext pb;
> -    int64_t size;
> +    int64_t comment_size, size;
>      uint8_t *p;
>  
>      ff_metadata_conv(m, ff_vorbiscomment_metadata_conv, NULL);
>  
> -    size = offset + ff_vorbiscomment_length(*m, vendor, chapters, nb_chapters) + framing_bit;
> +    comment_size = ff_vorbiscomment_length(*m, vendor, chapters, nb_chapters);
> +    size = offset + comment_size + framing_bit;

Any reason for adding this comment_size variable? Because it makes the
diff bigger without any need or benefit.

>      if (size > INT_MAX)
>          return NULL;
>      p = av_mallocz(size);
> @@ -290,7 +356,7 @@ static uint8_t *ogg_write_vorbiscomment(int64_t offset, int bitexact,
>          return NULL;
>  
>      ffio_init_context(&pb, p + offset, size - offset, 1, NULL, NULL, NULL, NULL);
> -    ff_vorbiscomment_write(&pb, *m, vendor, chapters, nb_chapters);
> +    ff_vorbiscomment_write(&pb, *m, vendor, chapters, nb_chapters, nb_attach);
>      if (framing_bit)
>          avio_w8(&pb, 1);
>  
> @@ -317,18 +383,21 @@ static int ogg_build_flac_headers(AVCodecParameters *par,
>      bytestream_put_buffer(&p, "FLAC", 4);
>      bytestream_put_byte(&p, 1); // major version
>      bytestream_put_byte(&p, 0); // minor version
> -    bytestream_put_be16(&p, 1); // headers packets without this one
> +    bytestream_put_be16(&p, 1 + oggstream->attach_count); // headers packets without this one
>      bytestream_put_buffer(&p, "fLaC", 4);
>      bytestream_put_byte(&p, 0x00); // streaminfo
>      bytestream_put_be24(&p, 34);
>      bytestream_put_buffer(&p, par->extradata, FLAC_STREAMINFO_SIZE);
>  
>      // second packet: VorbisComment
> -    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1], m, 0, NULL, 0);
> +    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1], m, 0, NULL, 0, 0);
>      if (!p)
>          return AVERROR(ENOMEM);
>      oggstream->header[1] = p;
> -    bytestream_put_byte(&p, 0x84); // last metadata block and vorbis comment
> +    if (!oggstream->attach_count)
> +        bytestream_put_byte(&p, 0x84); // last metadata block and vorbis comment
> +    else
> +        bytestream_put_byte(&p, 0x04); // vorbis comment followed by attachments
>      bytestream_put_be24(&p, oggstream->header_len[1] - 4);
>  
>      return 0;
> @@ -355,7 +424,8 @@ static int ogg_build_speex_headers(AVCodecParameters *par,
>      AV_WL32(&oggstream->header[0][68], 0);  // set extra_headers to 0
>  
>      // second packet: VorbisComment
> -    p = ogg_write_vorbiscomment(0, bitexact, &oggstream->header_len[1], m, 0, NULL, 0);
> +    p = ogg_write_vorbiscomment(0, bitexact, &oggstream->header_len[1], m, 0,
> +                                NULL, 0, oggstream->attach_count);
>      if (!p)
>          return AVERROR(ENOMEM);
>      oggstream->header[1] = p;
> @@ -384,7 +454,8 @@ static int ogg_build_opus_headers(AVCodecParameters *par,
>      bytestream_put_buffer(&p, par->extradata, par->extradata_size);
>  
>      /* second packet: VorbisComment */
> -    p = ogg_write_vorbiscomment(8, bitexact, &oggstream->header_len[1], m, 0, chapters, nb_chapters);
> +    p = ogg_write_vorbiscomment(8, bitexact, &oggstream->header_len[1], m, 0,
> +                                chapters, nb_chapters, oggstream->attach_count);
>      if (!p)
>          return AVERROR(ENOMEM);
>      oggstream->header[1] = p;
> @@ -428,8 +499,9 @@ static int ogg_build_vp8_headers(AVFormatContext *s, AVStream *st,
>      bytestream_put_be32(&p, st->time_base.num);
>  
>      /* optional second packet: VorbisComment */
> -    if (av_dict_get(st->metadata, "", NULL, AV_DICT_IGNORE_SUFFIX)) {
> -        p = ogg_write_vorbiscomment(7, bitexact, &oggstream->header_len[1], &st->metadata, 0, NULL, 0);
> +    if (av_dict_get(st->metadata, "", NULL, AV_DICT_IGNORE_SUFFIX) || oggstream->attach_count) {
> +        p = ogg_write_vorbiscomment(7, bitexact, &oggstream->header_len[1], &st->metadata, 0,
> +                                    NULL, 0, oggstream->attach_count);
>          if (!p)
>              return AVERROR(ENOMEM);
>          oggstream->header[1] = p;
> @@ -470,22 +542,18 @@ static int ogg_init(AVFormatContext *s)
>  {
>      OGGContext *ogg = s->priv_data;
>      OGGStreamContext *oggstream = NULL;
> -    int i, j;
> +    OGGAttachContext *attachstream = NULL;
> +    AVStream *st;
> +    int i, j, k = 0;
>  
>      if (ogg->pref_size)
>          av_log(s, AV_LOG_WARNING, "The pagesize option is deprecated\n");
>  
> +    // init stream priv_data
>      for (i = 0; i < s->nb_streams; i++) {
> -        AVStream *st = s->streams[i];
> -        unsigned serial_num = i + ogg->serial_offset;
> -
> -        if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> -            if (st->codecpar->codec_id == AV_CODEC_ID_OPUS)
> -                /* Opus requires a fixed 48kHz clock */
> -                avpriv_set_pts_info(st, 64, 1, 48000);
> -            else
> -                avpriv_set_pts_info(st, 64, 1, st->codecpar->sample_rate);
> -        }
> +        st = s->streams[i];
> +        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;
>  
>          if (st->codecpar->codec_id != AV_CODEC_ID_VORBIS &&
>              st->codecpar->codec_id != AV_CODEC_ID_THEORA &&
> @@ -497,14 +565,82 @@ static int ogg_init(AVFormatContext *s)
>              return AVERROR(EINVAL);
>          }
>  
> +        oggstream = av_mallocz(sizeof(*oggstream));
> +        if (!oggstream)
> +            return AVERROR(ENOMEM);
> +        st->priv_data = oggstream;
> +    }
> +
> +    // setup attachment association
> +    for (i = 0; i < s->nb_streams; i++) {
> +        st = s->streams[i];
> +        if (!(st->disposition & AV_DISPOSITION_ATTACHED_PIC))
> +            continue;
> +        ogg->attach_count++;
> +
> +        if (st->codecpar->codec_id != AV_CODEC_ID_GIF &&
> +            st->codecpar->codec_id != AV_CODEC_ID_MJPEG &&
> +            st->codecpar->codec_id != AV_CODEC_ID_PNG &&
> +            st->codecpar->codec_id != AV_CODEC_ID_TIFF &&
> +            st->codecpar->codec_id != AV_CODEC_ID_BMP) {
> +            av_log(s, AV_LOG_ERROR, "Unsupported attachment codec id in stream %d\n", i);
> +            return AVERROR(EINVAL);
> +        }
> +
> +        attachstream = av_mallocz(sizeof(*attachstream));
> +        if (!attachstream)
> +            return AVERROR(ENOMEM);
> +        st->priv_data = attachstream;
> +
> +        int attached = 0;
> +        AVDictionaryEntry *e = av_dict_get(st->metadata, "ogg_attach", NULL, 0);
> +        if (e) {
> +            int ogg_attach = -1;
> +            int res = sscanf(e->value, "%d", &ogg_attach);
> +            if (res > 0 && ogg_attach >= 0 && ogg_attach < s->nb_streams &&
> +                !(s->streams[ogg_attach]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
> +                attachstream->attach_to = ogg_attach;
> +                oggstream = s->streams[ogg_attach]->priv_data;
> +                oggstream->attach_count++;
> +                attached = 1;
> +            }
> +        }
> +
> +        for (j = 0; !attached && j < s->nb_streams; ++j) {
> +            if (!(s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
> +                attachstream->attach_to = j;
> +                oggstream = s->streams[j]->priv_data;
> +                oggstream->attach_count++;
> +                attached = 1;
> +            }
> +        }
> +        if (!attached) {
> +            av_log(s, AV_LOG_ERROR, "Cannot add attachment %d without an audio/video stream\n", i);
> +            return AVERROR(EINVAL);
> +        }
> +    }
> +
> +    for (i = 0; i < s->nb_streams; i++) {
> +        st = s->streams[i];
> +        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;
> +
> +        unsigned serial_num = k++ + ogg->serial_offset;
> +

This should give a compiler warning (mixed declaration and code). FFmpeg
uses the C90 rule that declarations are only allowed at the beginning of
a block. This is not the only instance of this.

> +        if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> +            if (st->codecpar->codec_id == AV_CODEC_ID_OPUS)
> +                /* Opus requires a fixed 48kHz clock */
> +                avpriv_set_pts_info(st, 64, 1, 48000);
> +            else
> +                avpriv_set_pts_info(st, 64, 1, st->codecpar->sample_rate);
> +        }
> +
>          if ((!st->codecpar->extradata || !st->codecpar->extradata_size) &&
>              st->codecpar->codec_id != AV_CODEC_ID_VP8) {
>              av_log(s, AV_LOG_ERROR, "No extradata present\n");
>              return AVERROR_INVALIDDATA;
>          }
> -        oggstream = av_mallocz(sizeof(*oggstream));
> -        if (!oggstream)
> -            return AVERROR(ENOMEM);
> +        oggstream = st->priv_data;
>  
>          oggstream->page.stream_index = i;
>  
> @@ -521,7 +657,6 @@ static int ogg_init(AVFormatContext *s)
>  
>          av_dict_copy(&st->metadata, s->metadata, AV_DICT_DONT_OVERWRITE);
>  
> -        st->priv_data = oggstream;
>          if (st->codecpar->codec_id == AV_CODEC_ID_FLAC) {
>              int err = ogg_build_flac_headers(st->codecpar, oggstream,
>                                               s->flags & AVFMT_FLAG_BITEXACT,
> @@ -557,7 +692,8 @@ static int ogg_init(AVFormatContext *s)
>              uint8_t *p;
>              const char *cstr = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? "vorbis" : "theora";
>              int header_type = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 3 : 0x81;
> -            int framing_bit = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 1 : 0;
> +            int framing_bit = oggstream->attach_count > 0 ? 0 : 
> +                              st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 1 : 0;
>  
>              if (avpriv_split_xiph_headers(st->codecpar->extradata, st->codecpar->extradata_size,
>                                        st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 30 : 42,
> @@ -569,7 +705,7 @@ static int ogg_init(AVFormatContext *s)
>  
>              p = ogg_write_vorbiscomment(7, s->flags & AVFMT_FLAG_BITEXACT,
>                                          &oggstream->header_len[1], &st->metadata,
> -                                        framing_bit, NULL, 0);
> +                                        framing_bit, NULL, 0, oggstream->attach_count);
>              oggstream->header[1] = p;
>              if (!p)
>                  return AVERROR(ENOMEM);
> @@ -601,36 +737,129 @@ static int ogg_init(AVFormatContext *s)
>  
>  static int ogg_write_header(AVFormatContext *s)
>  {
> +    OGGContext *ogg = s->priv_data;
>      OGGStreamContext *oggstream = NULL;
> -    int i, j;
> +    int j;
>  
>      for (j = 0; j < s->nb_streams; j++) {
> +        if (s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;
>          oggstream = s->streams[j]->priv_data;
>          ogg_buffer_data(s, s->streams[j], oggstream->header[0],
>                          oggstream->header_len[0], 0, 1);
>          oggstream->page.flags |= 2; // bos
> -        ogg_buffer_page(s, oggstream);
> +        ogg_buffer_page(s, oggstream, NULL);
>      }
> +    ogg_write_pages(s, 2);
> +
>      for (j = 0; j < s->nb_streams; j++) {
> +        if (s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;
>          AVStream *st = s->streams[j];
>          oggstream = st->priv_data;
> -        for (i = 1; i < 3; i++) {
> -            if (oggstream->header_len[i])
> -                ogg_buffer_data(s, st, oggstream->header[i],
> -                                oggstream->header_len[i], 0, 1);
> +        if (oggstream->header_len[1]) {
> +            ogg_buffer_data(s, st, oggstream->header[1],
> +                            oggstream->header_len[1], 0, 1);
> +            ogg_buffer_page(s, oggstream, &oggstream->comments);
> +        }
> +        if (oggstream->header_len[2]) {
> +            ogg_buffer_data(s, st, oggstream->header[2],
> +                            oggstream->header_len[2], 0, 1);
> +            ogg_buffer_page(s, oggstream, NULL);
>          }
> -        ogg_buffer_page(s, oggstream);
>      }
>  
>      oggstream->page.start_granule = AV_NOPTS_VALUE;
>  
> -    ogg_write_pages(s, 2);
> +    if (!ogg->attach_count)
> +        ogg_write_pages(s, 2);
> +
> +    return 0;
> +}
> +
> +static int ogg_write_attached_packet_internal(AVFormatContext *s, AVPacket *pkt)
> +{
> +    OGGContext *ogg = s->priv_data;
> +    AVStream *st = s->streams[pkt->stream_index];
> +    OGGAttachContext *attachstream = st->priv_data;
> +    AVStream *base_st = s->streams[attachstream->attach_to];
> +    OGGStreamContext *oggstream = base_st->priv_data;
> +    uint8_t *data;
> +    int data_len;
> +    int ret;
> +
> +    attachstream->count++;
> +    // Only use the first packet received for each attached stream
> +    if (attachstream->count == 2)
> +        av_log(s, AV_LOG_WARNING, "Attached stream %d has too many packets, ignoring\n",
> +               pkt->stream_index);
> +    if (attachstream->count > 1)
> +        return 0;
> +
> +    if (!ogg->attach_count || !oggstream->attach_count) {
> +        av_log(s, AV_LOG_ERROR, "More attached streams than expected\n");
> +        return -1;

I think this check should be an assert if it can't really happen (if I
am not mistaken, it could only happen if someone added the attached pic
disposition flag to a stream after init and this is forbidden).

> +    }
> +
> +    AVIOContext *pb;

This should give a compiler warning (mixed declaration and code). FFmpeg
uses the C90 rule that declarations are only allowed at the beginning of
a block. This is not the only instance of this.

> +    avio_open_dyn_buf(&pb);
> +    if (!pb)
> +        return AVERROR(ENOMEM);

avio_open_dyn_buf returns an error code of its own; just return it.
(avio_open_dyn_buf does not guarantee that pb has been set on failure
and indeed it doesn't do so if the first allocation fails in which case
pb will still be uninitialized. (It probably should set it consistently,
but it is not documented, so you must not rely on this.))

>  
> +    if (base_st->codecpar->codec_id == AV_CODEC_ID_FLAC) {
> +        int picture_len = ff_flac_picture_length(s, pkt);
> +        if (picture_len >= 1<<24) {
> +            av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", picture_len, 1<<24);
> +            return AVERROR(EINVAL);

You are leaking pb here.

> +        }
> +        if (oggstream->attach_count == 1)
> +            avio_w8(pb, 0x86); // last attachment
> +        else
> +            avio_w8(pb, 0x06);
> +        avio_wb24(pb, picture_len);
> +        ff_flac_write_picture(s, pb, pkt, &oggstream->attached_types);
> +        data_len = avio_close_dyn_buf(pb, &data);
> +        if (!data || data_len < 0)
> +            return AVERROR(ENOMEM);
> +    } else {
> +        uint8_t *bin_data;
> +        int bin_data_len;
> +        ff_flac_write_picture(s, pb, pkt, &oggstream->attached_types);
> +        bin_data_len = avio_close_dyn_buf(pb, &bin_data);
> +        if (!bin_data || bin_data_len < 0)
> +            return AVERROR(ENOMEM);
> +
> +        data_len = 4 + 23 + AV_BASE64_SIZE(bin_data_len) - 1;
> +        data = av_malloc(data_len + 1);
> +        if (!data)
> +            return AVERROR(ENOMEM);

You are leaking bin_data here.

> +        AV_WL32(data, data_len - 4);
> +        strcpy(data + 4, "METADATA_BLOCK_PICTURE=");
> +        av_base64_encode(data + 4 + 23, data_len - 4 - 23 + 1, bin_data, bin_data_len);
> +        av_free(bin_data);
> +
> +        // add framing bit when required
> +        if (oggstream->attach_count == 1 &&
> +            base_st->codecpar->codec_id == AV_CODEC_ID_VORBIS)
> +            data[data_len++] = 1;
> +    }
> +
> +    ret = ogg_append_page(s, &oggstream->comments, data, data_len,
> +                          base_st->codecpar->codec_id == AV_CODEC_ID_FLAC);
> +    av_free(data);
> +    if (ret < 0)
> +        return ret;
> +
> +    oggstream->attach_count--;
> +    ogg->attach_count--;
> +    if (!ogg->attach_count)
> +        ogg_write_pages(s, 0);
>      return 0;
>  }
>  
>  static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
>  {
> +    OGGContext *ogg = s->priv_data;
>      AVStream *st = s->streams[pkt->stream_index];
>      OGGStreamContext *oggstream = st->priv_data;
>      int ret;
> @@ -674,7 +903,9 @@ static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
>      if (ret < 0)
>          return ret;
>  
> -    ogg_write_pages(s, 0);
> +    if (!ogg->attach_count) {
> +        ogg_write_pages(s, 0);
> +    }
>  
>      oggstream->last_granule = granule;
>  
> @@ -683,15 +914,26 @@ static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
>  
>  static int ogg_write_packet(AVFormatContext *s, AVPacket *pkt)
>  {
> +    OGGContext *ogg = s->priv_data;
>      int i;
>  
> -    if (pkt)
> -        return ogg_write_packet_internal(s, pkt);
> +    if (pkt) {
> +        AVStream *st = s->streams[pkt->stream_index];
> +        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            return ogg_write_attached_packet_internal(s, pkt);
> +        else
> +            return ogg_write_packet_internal(s, pkt);
> +    }
> +
> +    if (ogg->attach_count)
> +        return 0;
>  
>      for (i = 0; i < s->nb_streams; i++) {
> +        if (s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;

This should give a compiler warning (mixed declaration and code). FFmpeg
uses the C90 rule that declarations are only allowed at the beginning of
a block. This is not the only instance of this.

>          OGGStreamContext *oggstream = s->streams[i]->priv_data;
>          if (oggstream->page.segments_count)
> -            ogg_buffer_page(s, oggstream);
> +            ogg_buffer_page(s, oggstream, NULL);
>      }
>  
>      ogg_write_pages(s, 2);
> @@ -700,14 +942,20 @@ static int ogg_write_packet(AVFormatContext *s, AVPacket *pkt)
>  
>  static int ogg_write_trailer(AVFormatContext *s)
>  {
> +    OGGContext *ogg = s->priv_data;
>      int i;
>  
> +    if (ogg->attach_count)
> +        av_log(s, AV_LOG_WARNING, "Some attached images were never written\n");
> +
>      /* flush current page if needed */
>      for (i = 0; i < s->nb_streams; i++) {
> +        if (s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)
> +            continue;

This should give a compiler warning (mixed declaration and code). FFmpeg
uses the C90 rule that declarations are only allowed at the beginning of
a block. This is not the only instance of this.

>          OGGStreamContext *oggstream = s->streams[i]->priv_data;
>  
>          if (oggstream->page.size > 0)
> -            ogg_buffer_page(s, oggstream);
> +            ogg_buffer_page(s, oggstream, NULL);
>      }
>  
>      ogg_write_pages(s, 1);
> @@ -723,16 +971,18 @@ static void ogg_free(AVFormatContext *s)
>  
>      for (i = 0; i < s->nb_streams; i++) {
>          AVStream *st = s->streams[i];
> -        OGGStreamContext *oggstream = st->priv_data;
> -        if (!oggstream)
> -            continue;
> -        if (st->codecpar->codec_id == AV_CODEC_ID_FLAC ||
> -            st->codecpar->codec_id == AV_CODEC_ID_SPEEX ||
> -            st->codecpar->codec_id == AV_CODEC_ID_OPUS ||
> -            st->codecpar->codec_id == AV_CODEC_ID_VP8) {
> -            av_freep(&oggstream->header[0]);
> -        }
> -        av_freep(&oggstream->header[1]);
> +        if (!(s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
> +            OGGStreamContext *oggstream = st->priv_data;
> +            if (!oggstream)
> +                continue;
> +            if (st->codecpar->codec_id == AV_CODEC_ID_FLAC ||
> +                st->codecpar->codec_id == AV_CODEC_ID_SPEEX ||
> +                st->codecpar->codec_id == AV_CODEC_ID_OPUS ||
> +                st->codecpar->codec_id == AV_CODEC_ID_VP8) {
> +                av_freep(&oggstream->header[0]);
> +            }
> +            av_freep(&oggstream->header[1]);
> +         }

Just replacing "if (!oggstream)" by "if (st->disposition &
AV_DISPOSITION_ATTACHED_PIC || !oggstream)" would reduce the diff and
increase readability.

>      }
>  
>      while (p) {
> @@ -743,6 +993,28 @@ static void ogg_free(AVFormatContext *s)
>      ogg->page_list = NULL;
>  }
>  
> +static int ogg_query_codec(enum AVCodecID id, int std_compliance)
> +{
> +    switch (id) {
> +        case AV_CODEC_ID_VORBIS:
> +        case AV_CODEC_ID_THEORA:
> +        case AV_CODEC_ID_SPEEX:
> +        case AV_CODEC_ID_FLAC:
> +        case AV_CODEC_ID_OPUS:
> +        case AV_CODEC_ID_VP8:
> +            return 1;
> +        case AV_CODEC_ID_GIF:
> +        case AV_CODEC_ID_MJPEG:
> +        case AV_CODEC_ID_PNG:
> +        case AV_CODEC_ID_TIFF:
> +        case AV_CODEC_ID_BMP:
> +            return MKTAG('A', 'P', 'I', 'C');
> +        default:
> +            return 0;
> +    }
> +    return 0;
> +}
> +
>  #if CONFIG_OGG_MUXER
>  OGG_CLASS(ogg, Ogg)
>  AVOutputFormat ff_ogg_muxer = {
> @@ -771,6 +1043,7 @@ AVOutputFormat ff_ogg_muxer = {
>      .deinit            = ogg_free,
>      .flags             = AVFMT_TS_NEGATIVE | AVFMT_TS_NONSTRICT | AVFMT_ALLOW_FLUSH,
>      .priv_class        = &ogg_muxer_class,
> +    .query_codec       = ogg_query_codec,
>  };
>  #endif
>  
> @@ -783,6 +1056,7 @@ AVOutputFormat ff_oga_muxer = {
>      .extensions        = "oga",
>      .priv_data_size    = sizeof(OGGContext),
>      .audio_codec       = AV_CODEC_ID_FLAC,
> +    .video_codec       = AV_CODEC_ID_MJPEG,
>      .init              = ogg_init,
>      .write_header      = ogg_write_header,
>      .write_packet      = ogg_write_packet,
> @@ -790,6 +1064,7 @@ AVOutputFormat ff_oga_muxer = {
>      .deinit            = ogg_free,
>      .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
>      .priv_class        = &oga_muxer_class,
> +    .query_codec       = ogg_query_codec,
>  };
>  #endif
>  
> @@ -812,6 +1087,7 @@ AVOutputFormat ff_ogv_muxer = {
>      .deinit            = ogg_free,
>      .flags             = AVFMT_TS_NEGATIVE | AVFMT_TS_NONSTRICT | AVFMT_ALLOW_FLUSH,
>      .priv_class        = &ogv_muxer_class,
> +    .query_codec       = ogg_query_codec,
>  };
>  #endif
>  
> @@ -824,6 +1100,7 @@ AVOutputFormat ff_spx_muxer = {
>      .extensions        = "spx",
>      .priv_data_size    = sizeof(OGGContext),
>      .audio_codec       = AV_CODEC_ID_SPEEX,
> +    .video_codec       = AV_CODEC_ID_MJPEG,
>      .init              = ogg_init,
>      .write_header      = ogg_write_header,
>      .write_packet      = ogg_write_packet,
> @@ -831,6 +1108,7 @@ AVOutputFormat ff_spx_muxer = {
>      .deinit            = ogg_free,
>      .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
>      .priv_class        = &spx_muxer_class,
> +    .query_codec       = ogg_query_codec,
>  };
>  #endif
>  
> @@ -843,6 +1121,7 @@ AVOutputFormat ff_opus_muxer = {
>      .extensions        = "opus",
>      .priv_data_size    = sizeof(OGGContext),
>      .audio_codec       = AV_CODEC_ID_OPUS,
> +    .video_codec       = AV_CODEC_ID_MJPEG,
>      .init              = ogg_init,
>      .write_header      = ogg_write_header,
>      .write_packet      = ogg_write_packet,
> @@ -850,5 +1129,6 @@ AVOutputFormat ff_opus_muxer = {
>      .deinit            = ogg_free,
>      .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
>      .priv_class        = &opus_muxer_class,
> +    .query_codec       = ogg_query_codec,
>  };
>  #endif
> diff --git a/libavformat/oggparseflac.c b/libavformat/oggparseflac.c
> index 4e85b05c67..840d1b73ef 100644
> --- a/libavformat/oggparseflac.c
> +++ b/libavformat/oggparseflac.c
> @@ -22,6 +22,7 @@
>  #include "libavcodec/get_bits.h"
>  #include "libavcodec/flac.h"
>  #include "avformat.h"
> +#include "flac_picture.h"
>  #include "internal.h"
>  #include "oggdec.h"
>  
> @@ -72,6 +73,9 @@ flac_header (AVFormatContext * s, int idx)
>          avpriv_set_pts_info(st, 64, 1, samplerate);
>      } else if (mdt == FLAC_METADATA_TYPE_VORBIS_COMMENT) {
>          ff_vorbis_stream_comment(s, st, os->buf + os->pstart + 4, os->psize - 4);
> +    } else if (mdt == FLAC_METADATA_TYPE_PICTURE) {
> +        int ret = ff_flac_parse_picture(s, os->buf + os->pstart + 4, os->psize - 4, idx, 1);
> +        if (ret < 0) return ret;
>      }
>  
>      return 1;
> diff --git a/libavformat/oggparsevorbis.c b/libavformat/oggparsevorbis.c
> index 0e8c25c030..c400fe8587 100644
> --- a/libavformat/oggparsevorbis.c
> +++ b/libavformat/oggparsevorbis.c
> @@ -75,7 +75,7 @@ static int ogm_chapter(AVFormatContext *as, uint8_t *key, uint8_t *val)
>  int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
>                               const uint8_t *buf, int size)
>  {
> -    int updates = ff_vorbis_comment(as, &st->metadata, buf, size, 1);
> +    int updates = ff_vorbis_comment(as, &st->metadata, buf, size, st->index);
>  
>      if (updates > 0) {
>          st->event_flags |= AVSTREAM_EVENT_FLAG_METADATA_UPDATED;
> @@ -86,7 +86,7 @@ int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
>  
>  int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
>                        const uint8_t *buf, int size,
> -                      int parse_picture)
> +                      int parse_picture_index)
>  {
>      const uint8_t *p   = buf;
>      const uint8_t *end = buf + size;
> @@ -151,7 +151,7 @@ int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
>               * 'METADATA_BLOCK_PICTURE'. This is the preferred and
>               * recommended way of embedding cover art within VorbisComments."
>               */
> -            if (!av_strcasecmp(tt, "METADATA_BLOCK_PICTURE") && parse_picture) {
> +            if (!av_strcasecmp(tt, "METADATA_BLOCK_PICTURE") && parse_picture_index > -2) {
>                  int ret, len = AV_BASE64_DECODE_SIZE(vl);
>                  char *pict = av_malloc(len);
>  
> @@ -165,7 +165,7 @@ int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
>                  av_freep(&tt);
>                  av_freep(&ct);
>                  if (ret > 0)
> -                    ret = ff_flac_parse_picture(as, pict, ret, 0);
> +                    ret = ff_flac_parse_picture(as, pict, ret, parse_picture_index, 0);
>                  av_freep(&pict);
>                  if (ret < 0) {
>                      av_log(as, AV_LOG_WARNING, "Failed to parse cover art block.\n");
> diff --git a/libavformat/vorbiscomment.c b/libavformat/vorbiscomment.c
> index a929634cc0..89127236fa 100644
> --- a/libavformat/vorbiscomment.c
> +++ b/libavformat/vorbiscomment.c
> @@ -21,6 +21,7 @@
>  
>  #include "avio.h"
>  #include "avformat.h"
> +#include "flac_picture.h"

Seems unused.

>  #include "metadata.h"
>  #include "vorbiscomment.h"
>  #include "libavutil/dict.h"
> @@ -43,6 +44,12 @@ int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
>  {
>      int64_t len = 8;
>      len += strlen(vendor_string);
> +    if (m) {
> +        AVDictionaryEntry *tag = NULL;
> +        while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
> +            len += 4 +strlen(tag->key) + 1 + strlen(tag->value);
> +        }
> +    }
>      if (chapters && nb_chapters) {
>          for (int i = 0; i < nb_chapters; i++) {
>              AVDictionaryEntry *tag = NULL;
> @@ -53,18 +60,12 @@ int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
>              }
>          }
>      }
> -    if (m) {
> -        AVDictionaryEntry *tag = NULL;
> -        while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
> -            len += 4 +strlen(tag->key) + 1 + strlen(tag->value);
> -        }
> -    }

Unnecessary noise.

>      return len;
>  }
>  
> -int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
> -                           const char *vendor_string,
> -                           AVChapter **chapters, unsigned int nb_chapters)
> +int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m, const char *vendor_string,
> +                           AVChapter **chapters, unsigned int nb_chapters,
> +                           int external_count)
>  {
>      int cm_count = 0;
>      avio_wl32(pb, strlen(vendor_string));
> @@ -74,10 +75,11 @@ int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
>              cm_count += av_dict_count(chapters[i]->metadata) + 1;
>          }
>      }
> +    int count = av_dict_count(m) + cm_count + external_count;
> +    avio_wl32(pb, count);
> +
>      if (m) {
> -        int count = av_dict_count(m) + cm_count;
>          AVDictionaryEntry *tag = NULL;
> -        avio_wl32(pb, count);
>          while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
>              int64_t len1 = strlen(tag->key);
>              int64_t len2 = strlen(tag->value);
> @@ -88,6 +90,8 @@ int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
>              avio_w8(pb, '=');
>              avio_write(pb, tag->value, len2);
>          }
> +    }
> +    if (chapters && nb_chapters) {
>          for (int i = 0; i < nb_chapters; i++) {
>              AVChapter *chp = chapters[i];
>              char chapter_time[13];
> @@ -107,7 +111,7 @@ int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
>              avio_w8(pb, '=');
>              avio_write(pb, chapter_time, 12);
>  
> -            tag = NULL;
> +            AVDictionaryEntry *tag = NULL;
>              while ((tag = av_dict_get(chapters[i]->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
>                  int64_t len1 = !strcmp(tag->key, "title") ? 4 : strlen(tag->key);
>                  int64_t len2 = strlen(tag->value);
> @@ -124,7 +128,7 @@ int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
>                  avio_write(pb, tag->value, len2);
>              }
>          }
> -    } else
> -        avio_wl32(pb, 0);
> +    }
> +

The changes to ff_vorbiscomment_write look good. You are actually fixing
a bug in ff_vorbiscomment_write: It doesn't write chapters in the
absence of a dictionary. I don't see a reason for this and in any case
it contradicts the behaviour of ff_vorbiscomment_length. But this fix
should be in a separate commit preceding the changes relating to
attachments. (external_count should not be in this commit.)

>      return 0;
>  }
> diff --git a/libavformat/vorbiscomment.h b/libavformat/vorbiscomment.h
> index 7cacd0b2a0..1dc9ed1da0 100644
> --- a/libavformat/vorbiscomment.h
> +++ b/libavformat/vorbiscomment.h
> @@ -48,9 +48,9 @@ int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
>   * @param chapters The chapters to write.
>   * @param nb_chapters The number of chapters to write.
>   */
> -int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
> -                           const char *vendor_string,
> -                           AVChapter **chapters, unsigned int nb_chapters);
> +int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m, const char *vendor_string,
> +                           AVChapter **chapters, unsigned int nb_chapters,
> +                           int external_count);
>  
>  extern const AVMetadataConv ff_vorbiscomment_metadata_conv[];
>  
>
LRFLEW June 8, 2020, 12:06 a.m. UTC | #7
>$ make SAMPLES=fate-suite fate-rsync
>$ make SAMPLES=fate-suite fate

That explains it. The "Contribute" documentation just says "make fate", so I didn't realize I wasn't running the entire suite.
I re-ran the tests, and saw a different test fail, being lavf-fate-vp3.ogg. Looking at the hex data, it appears that my patch is resulting in the OGG pages being split differently. I'm not sure what's causing that, so I'll need to spend some time looking into it.
Thanks
LRFLEW June 8, 2020, 10:45 p.m. UTC | #8
>seems to break fate



Ok, I spent some time looking into what's happening, and I've figured what's causing the failures. The issue isn't that it's producing *invalid* output, but that it's just *different* output. As I had said, I wrote this patch some time ago, so I had forgotten a decision I had made to work around a problem I was facing. The way I implemented the delayed writing of the Vorbis Comments meant that the comments packet needed to end the OGG page. This was already the case for the audio codecs (which is what I was writing it to use it for at the time), but the video codecs have another header packet after the vorbis comments that was being written to the same page (if it fit). I ended up splitting the page so that the headers were placed in different pages. You can see this in the patch where, in the second loop in ogg_write_header, I replaced one call to ogg_buffer_page with two.


The way I see it, there are three options here:


* Accept that the new output is valid and update the test cases
* Rework the way the Vorbis Comments are written to avoid the issue

* Use an if statement to only split the headers into separate pages when there are attachments


The last one is the quickest to implement, as I have already done it to test that this is the only thing causing fate errors. However, it very much feels like a hack, changing the code specifically for a test case instead of either fixing the test case or solving the underlying problem. The first option is the next easiest, as it would just be a matter of updating the ref files. However, I can see that being an unpopular option, depending on how much the change in output is considered an "issue" (as apposed to just a change). The middle one is probably the most complete, but would be a real challenge. Having been thinking about it, I think it might be reasonable to implement if, instead of buffering the header pages when ogg_write_header is called, the buffering/writing of the pages is delayed until all the attachments are ready, using OGGStreamContext.header to store/buffer the data until it's ready to be written.


Does anybody have any feedback on this? I'll probably start experimenting with the second option, but it might take me a while before I have anything functional for it.
diff mbox series

Patch

diff --git a/libavformat/flac_picture.c b/libavformat/flac_picture.c
index 53e24b28b7..73edc7e238 100644
--- a/libavformat/flac_picture.c
+++ b/libavformat/flac_picture.c
@@ -20,6 +20,7 @@ 
  */
 
 #include "libavutil/intreadwrite.h"
+#include "libavutil/pixdesc.h"
 #include "libavcodec/bytestream.h"
 #include "libavcodec/png.h"
 #include "avformat.h"
@@ -29,7 +30,8 @@ 
 
 #define MAX_TRUNC_PICTURE_SIZE (500 * 1024 * 1024)
 
-int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int truncate_workaround)
+int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size,
+                          int ogg_index, int truncate_workaround)
 {
     const CodecMime *mime = ff_id3v2_mime_tags;
     enum AVCodecID id = AV_CODEC_ID_NONE;
@@ -180,6 +182,8 @@  int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int tr
     av_dict_set(&st->metadata, "comment", ff_id3v2_picture_types[type], 0);
     if (desc)
         av_dict_set(&st->metadata, "title", desc, AV_DICT_DONT_STRDUP_VAL);
+    if (ogg_index >= 0)
+        av_dict_set_int(&st->metadata, "ogg_attach", ogg_index, 0);
 
     return 0;
 
@@ -189,3 +193,109 @@  fail:
 
     return ret;
 }
+
+int ff_flac_picture_length(struct AVFormatContext *s, AVPacket *pkt) {
+    const CodecMime *mime = ff_id3v2_mime_tags;
+    AVDictionaryEntry *e;
+    const char *mimetype = NULL, *desc = "";
+    const AVStream *st = s->streams[pkt->stream_index];
+    int mimelen, desclen;
+
+    if (!pkt->data)
+        return 0;
+
+    /* get the mimetype */
+    while (mime->id != AV_CODEC_ID_NONE) {
+        if (mime->id == st->codecpar->codec_id) {
+            mimetype = mime->str;
+            break;
+        }
+        mime++;
+    }
+    if (!mimetype)
+        return 0;
+    mimelen = strlen(mimetype);
+
+    /* get the description */
+    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
+        desc = e->value;
+    desclen = strlen(desc);
+
+    return 4 + 4 + mimelen + 4 + desclen + 4 + 4 + 4 + 4 + 4 + pkt->size;
+}
+
+int ff_flac_write_picture(struct AVFormatContext *s, AVIOContext *pb,
+                          AVPacket *pkt, unsigned *attached_types)
+{
+    const AVPixFmtDescriptor *pixdesc;
+    const CodecMime *mime = ff_id3v2_mime_tags;
+    AVDictionaryEntry *e;
+    const char *mimetype = NULL, *desc = "";
+    const AVStream *st = s->streams[pkt->stream_index];
+    int i, mimelen, desclen, type = 0;
+
+    if (!pkt->data)
+        return 0;
+
+    while (mime->id != AV_CODEC_ID_NONE) {
+        if (mime->id == st->codecpar->codec_id) {
+            mimetype = mime->str;
+            break;
+        }
+        mime++;
+    }
+    if (!mimetype) {
+        av_log(s, AV_LOG_ERROR, "No mimetype is known for stream %d, cannot "
+               "write an attached picture.\n", st->index);
+        return AVERROR(EINVAL);
+    }
+    mimelen = strlen(mimetype);
+
+    /* get the picture type */
+    e = av_dict_get(st->metadata, "comment", NULL, 0);
+    for (i = 0; e && i < FF_ARRAY_ELEMS(ff_id3v2_picture_types); i++) {
+        if (!av_strcasecmp(e->value, ff_id3v2_picture_types[i])) {
+            type = i;
+            break;
+        }
+    }
+
+    if ((*attached_types & (1 << type)) & 0x6) {
+        av_log(s, AV_LOG_ERROR, "Duplicate attachment for type '%s'\n", ff_id3v2_picture_types[type]);
+        return AVERROR(EINVAL);
+    }
+
+    if (type == 1 && (st->codecpar->codec_id != AV_CODEC_ID_PNG ||
+                      st->codecpar->width != 32 ||
+                      st->codecpar->height != 32)) {
+        av_log(s, AV_LOG_ERROR, "File icon attachment must be a 32x32 PNG");
+        return AVERROR(EINVAL);
+    }
+
+    *attached_types |= (1 << type);
+
+    /* get the description */
+    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
+        desc = e->value;
+    desclen = strlen(desc);
+
+    avio_wb32(pb, type);
+
+    avio_wb32(pb, mimelen);
+    avio_write(pb, mimetype, mimelen);
+
+    avio_wb32(pb, desclen);
+    avio_write(pb, desc, desclen);
+
+    avio_wb32(pb, st->codecpar->width);
+    avio_wb32(pb, st->codecpar->height);
+    if ((pixdesc = av_pix_fmt_desc_get(st->codecpar->format)))
+        avio_wb32(pb, av_get_bits_per_pixel(pixdesc));
+    else
+        avio_wb32(pb, 0);
+    avio_wb32(pb, 0);
+
+    avio_wb32(pb, pkt->size);
+    avio_write(pb, pkt->data, pkt->size);
+    return 0;
+}
diff --git a/libavformat/flac_picture.h b/libavformat/flac_picture.h
index 61fd0c8806..9fc18b05ee 100644
--- a/libavformat/flac_picture.h
+++ b/libavformat/flac_picture.h
@@ -26,6 +26,10 @@ 
 
 #define RETURN_ERROR(code) do { ret = (code); goto fail; } while (0)
 
-int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size, int truncate_workaround);
+int ff_flac_parse_picture(AVFormatContext *s, uint8_t *buf, int buf_size,
+                          int ogg_index, int truncate_workaround);
+int ff_flac_picture_length(struct AVFormatContext *s, AVPacket *pkt);
+int ff_flac_write_picture(struct AVFormatContext *s, AVIOContext *pb,
+                          AVPacket *pkt, unsigned *attached_types);
 
 #endif /* AVFORMAT_FLAC_PICTURE_H */
diff --git a/libavformat/flacdec.c b/libavformat/flacdec.c
index 79c05f14bf..4e9676a71d 100644
--- a/libavformat/flacdec.c
+++ b/libavformat/flacdec.c
@@ -146,7 +146,7 @@  static int flac_read_header(AVFormatContext *s)
             }
             av_freep(&buffer);
         } else if (metadata_type == FLAC_METADATA_TYPE_PICTURE) {
-            ret = ff_flac_parse_picture(s, buffer, metadata_size, 1);
+            ret = ff_flac_parse_picture(s, buffer, metadata_size, 1, -1);
             av_freep(&buffer);
             if (ret < 0) {
                 av_log(s, AV_LOG_ERROR, "Error parsing attached picture.\n");
@@ -177,7 +177,7 @@  static int flac_read_header(AVFormatContext *s)
             if (metadata_type == FLAC_METADATA_TYPE_VORBIS_COMMENT) {
                 AVDictionaryEntry *chmask;
 
-                ret = ff_vorbis_comment(s, &s->metadata, buffer, metadata_size, 1);
+                ret = ff_vorbis_comment(s, &s->metadata, buffer, metadata_size, -1);
                 if (ret < 0) {
                     av_log(s, AV_LOG_WARNING, "error parsing VorbisComment metadata\n");
                 } else if (ret > 0) {
diff --git a/libavformat/flacenc.c b/libavformat/flacenc.c
index b947a3b067..db8972f56f 100644
--- a/libavformat/flacenc.c
+++ b/libavformat/flacenc.c
@@ -21,11 +21,11 @@ 
 
 #include "libavutil/channel_layout.h"
 #include "libavutil/opt.h"
-#include "libavutil/pixdesc.h"
 #include "libavcodec/flac.h"
 #include "avformat.h"
 #include "avio_internal.h"
 #include "flacenc.h"
+#include "flac_picture.h"
 #include "id3v2.h"
 #include "internal.h"
 #include "vorbiscomment.h"
@@ -70,100 +70,15 @@  static int flac_write_block_comment(AVIOContext *pb, AVDictionary **m,
 
     avio_w8(pb, last_block ? 0x84 : 0x04);
     avio_wb24(pb, len);
-    ff_vorbiscomment_write(pb, *m, vendor, NULL, 0);
+    ff_vorbiscomment_write(pb, *m, vendor, NULL, 0, 0);
 
     return 0;
 }
 
-static int flac_write_picture(struct AVFormatContext *s, AVPacket *pkt)
-{
-    FlacMuxerContext *c = s->priv_data;
-    AVIOContext *pb = s->pb;
-    const AVPixFmtDescriptor *pixdesc;
-    const CodecMime *mime = ff_id3v2_mime_tags;
-    AVDictionaryEntry *e;
-    const char *mimetype = NULL, *desc = "";
-    const AVStream *st = s->streams[pkt->stream_index];
-    int i, mimelen, desclen, type = 0, blocklen;
-
-    if (!pkt->data)
-        return 0;
-
-    while (mime->id != AV_CODEC_ID_NONE) {
-        if (mime->id == st->codecpar->codec_id) {
-            mimetype = mime->str;
-            break;
-        }
-        mime++;
-    }
-    if (!mimetype) {
-        av_log(s, AV_LOG_ERROR, "No mimetype is known for stream %d, cannot "
-               "write an attached picture.\n", st->index);
-        return AVERROR(EINVAL);
-    }
-    mimelen = strlen(mimetype);
-
-    /* get the picture type */
-    e = av_dict_get(st->metadata, "comment", NULL, 0);
-    for (i = 0; e && i < FF_ARRAY_ELEMS(ff_id3v2_picture_types); i++) {
-        if (!av_strcasecmp(e->value, ff_id3v2_picture_types[i])) {
-            type = i;
-            break;
-        }
-    }
-
-    if ((c->attached_types & (1 << type)) & 0x6) {
-        av_log(s, AV_LOG_ERROR, "Duplicate attachment for type '%s'\n", ff_id3v2_picture_types[type]);
-        return AVERROR(EINVAL);
-    }
-
-    if (type == 1 && (st->codecpar->codec_id != AV_CODEC_ID_PNG ||
-                      st->codecpar->width != 32 ||
-                      st->codecpar->height != 32)) {
-        av_log(s, AV_LOG_ERROR, "File icon attachment must be a 32x32 PNG");
-        return AVERROR(EINVAL);
-    }
-
-    c->attached_types |= (1 << type);
-
-    /* get the description */
-    if ((e = av_dict_get(st->metadata, "title", NULL, 0)))
-        desc = e->value;
-    desclen = strlen(desc);
-
-    blocklen = 4 + 4 + mimelen + 4 + desclen + 4 + 4 + 4 + 4 + 4 + pkt->size;
-    if (blocklen >= 1<<24) {
-        av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", blocklen, 1<<24);
-        return AVERROR(EINVAL);
-    }
-
-    avio_w8(pb, 0x06);
-    avio_wb24(pb, blocklen);
-
-    avio_wb32(pb, type);
-
-    avio_wb32(pb, mimelen);
-    avio_write(pb, mimetype, mimelen);
-
-    avio_wb32(pb, desclen);
-    avio_write(pb, desc, desclen);
-
-    avio_wb32(pb, st->codecpar->width);
-    avio_wb32(pb, st->codecpar->height);
-    if ((pixdesc = av_pix_fmt_desc_get(st->codecpar->format)))
-        avio_wb32(pb, av_get_bits_per_pixel(pixdesc));
-    else
-        avio_wb32(pb, 0);
-    avio_wb32(pb, 0);
-
-    avio_wb32(pb, pkt->size);
-    avio_write(pb, pkt->data, pkt->size);
-    return 0;
-}
-
 static int flac_finish_header(struct AVFormatContext *s)
 {
     int i, ret, padding = s->metadata_header_padding;
+    FlacMuxerContext *pd = s->priv_data;
     if (padding < 0)
         padding = 8192;
     /* The FLAC specification states that 24 bits are used to represent the
@@ -175,7 +90,14 @@  static int flac_finish_header(struct AVFormatContext *s)
         AVPacket *pkt = st->priv_data;
         if (!pkt)
             continue;
-        ret = flac_write_picture(s, pkt);
+        int picture_len = ff_flac_picture_length(s, pkt);
+        if (picture_len >= 1<<24) {
+            av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", picture_len, 1<<24);
+            return AVERROR(EINVAL);
+        }
+        avio_w8(s->pb, 0x06);
+        avio_wb24(s->pb, picture_len);
+        ret = ff_flac_write_picture(s, s->pb, pkt, &pd->attached_types);
         av_packet_unref(pkt);
         if (ret < 0 && (s->error_recognition & AV_EF_EXPLODE))
             return ret;
diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
index bb3a126c29..fcbf3e0497 100644
--- a/libavformat/matroskadec.c
+++ b/libavformat/matroskadec.c
@@ -2005,7 +2005,7 @@  static int matroska_parse_flac(AVFormatContext *s,
             AVDictionary *dict = NULL;
             AVDictionaryEntry *chmask;
 
-            ff_vorbis_comment(s, &dict, p, block_size, 0);
+            ff_vorbis_comment(s, &dict, p, block_size, -2);
             chmask = av_dict_get(dict, "WAVEFORMATEXTENSIBLE_CHANNEL_MASK", NULL, 0);
             if (chmask) {
                 uint64_t mask = strtol(chmask->value, NULL, 0);
diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
index 1c1ea71f59..00bd769962 100644
--- a/libavformat/matroskaenc.c
+++ b/libavformat/matroskaenc.c
@@ -666,7 +666,7 @@  static int put_flac_codecpriv(AVFormatContext *s, AVIOContext *pb,
         avio_w8(pb, 0x84);
         avio_wb24(pb, len);
 
-        ff_vorbiscomment_write(pb, dict, vendor, NULL, 0);
+        ff_vorbiscomment_write(pb, dict, vendor, NULL, 0, 0);
 
         av_dict_free(&dict);
     }
diff --git a/libavformat/oggdec.h b/libavformat/oggdec.h
index 629a1d6262..58d7c86a4e 100644
--- a/libavformat/oggdec.h
+++ b/libavformat/oggdec.h
@@ -131,7 +131,8 @@  extern const struct ogg_codec ff_vorbis_codec;
 extern const struct ogg_codec ff_vp8_codec;
 
 int ff_vorbis_comment(AVFormatContext *ms, AVDictionary **m,
-                      const uint8_t *buf, int size, int parse_picture);
+                      const uint8_t *buf, int size,
+                      int parse_picture_index);
 
 int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
                              const uint8_t *buf, int size);
diff --git a/libavformat/oggenc.c b/libavformat/oggenc.c
index f5032759a6..4223690956 100644
--- a/libavformat/oggenc.c
+++ b/libavformat/oggenc.c
@@ -21,6 +21,7 @@ 
 
 #include <stdint.h>
 
+#include "libavutil/base64.h"
 #include "libavutil/crc.h"
 #include "libavutil/mathematics.h"
 #include "libavutil/opt.h"
@@ -30,6 +31,7 @@ 
 #include "libavcodec/flac.h"
 #include "avformat.h"
 #include "avio_internal.h"
+#include "flac_picture.h"
 #include "internal.h"
 #include "vorbiscomment.h"
 
@@ -46,6 +48,11 @@  typedef struct OGGPage {
     uint16_t size;
 } OGGPage;
 
+typedef struct OGGPageList {
+    OGGPage page;
+    struct OGGPageList *next;
+} OGGPageList;
+
 typedef struct OGGStreamContext {
     unsigned page_counter;
     uint8_t *header[3];
@@ -61,12 +68,16 @@  typedef struct OGGStreamContext {
     OGGPage page; ///< current page
     unsigned serial_num; ///< serial number
     int64_t last_granule; ///< last packet granule
+    /* for embeded pictures */
+    OGGPageList *comments;
+    unsigned int attach_count;
+    unsigned attached_types;
 } OGGStreamContext;
 
-typedef struct OGGPageList {
-    OGGPage page;
-    struct OGGPageList *next;
-} OGGPageList;
+typedef struct OGGAttachContext {
+    int attach_to;
+    unsigned int count;
+} OGGAttachContext;
 
 typedef struct OGGContext {
     const AVClass *class;
@@ -74,6 +85,7 @@  typedef struct OGGContext {
     int pref_size; ///< preferred page size (0 => fill all segments)
     int64_t pref_duration;      ///< preferred page duration (0 => fill all segments)
     int serial_offset;
+    unsigned int attach_count;
 } OGGContext;
 
 #define OFFSET(x) offsetof(OGGContext, x)
@@ -169,7 +181,7 @@  static int ogg_reset_cur_page(OGGStreamContext *oggstream)
     return 0;
 }
 
-static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream)
+static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream, OGGPageList **page)
 {
     OGGContext *ogg = s->priv_data;
     OGGPageList **p = &ogg->page_list;
@@ -190,6 +202,7 @@  static int ogg_buffer_page(AVFormatContext *s, OGGStreamContext *oggstream)
     }
     l->next = *p;
     *p = l;
+    if (page) *page = l;
 
     return 0;
 }
@@ -214,14 +227,14 @@  static int ogg_buffer_data(AVFormatContext *s, AVStream *st,
          ogg_granule_to_timestamp(oggstream, oggstream->last_granule) + 1 ||
          ogg_key_granule(oggstream, granule))) {
         if (oggstream->page.granule != -1)
-            ogg_buffer_page(s, oggstream);
+            ogg_buffer_page(s, oggstream, NULL);
         flush = 1;
     }
 
     // avoid a continued page
     if (!header && oggstream->page.size > 0 &&
         MAX_PAGE_SIZE - oggstream->page.size < size) {
-        ogg_buffer_page(s, oggstream);
+        ogg_buffer_page(s, oggstream, NULL);
     }
 
     for (i = 0; i < total_segments; ) {
@@ -255,34 +268,87 @@  static int ogg_buffer_data(AVFormatContext *s, AVStream *st,
                                          st->time_base, AV_TIME_BASE_Q);
 
             if (page->segments_count == 255) {
-                ogg_buffer_page(s, oggstream);
+                ogg_buffer_page(s, oggstream, NULL);
             } else if (!header) {
                 if ((ogg->pref_size     > 0 && page->size   >= ogg->pref_size) ||
                     (ogg->pref_duration > 0 && next - start >= ogg->pref_duration)) {
-                    ogg_buffer_page(s, oggstream);
+                    ogg_buffer_page(s, oggstream, NULL);
                 }
             }
         }
     }
 
     if (flush && oggstream->page.granule != -1)
-        ogg_buffer_page(s, oggstream);
+        ogg_buffer_page(s, oggstream, NULL);
+
+    return 0;
+}
+
+static int ogg_append_page(AVFormatContext *s, OGGPageList **page_list,
+                           uint8_t *data, unsigned size, int new_packet)
+{
+    int seg_size = 255, started = 0;
+    OGGPage *p = &(*page_list)->page;
+    int64_t granule = p->granule;
+
+    if (!new_packet) {
+        p->granule = -1;
+        int append = FFMIN(255 - p->segments[p->segments_count - 1], size);
+        memcpy(p->data + p->size, data, append);
+        p->size += append;
+        seg_size = (p->segments[p->segments_count - 1] += append);
+        data += append;
+        size -= append;
+        started = 1;
+    }
 
+    while (size > 0 || seg_size == 255) {
+        if (p->segments_count == 255) {
+            OGGPageList *new_page = av_malloc(sizeof(OGGPageList));
+            if (!new_page)
+                return AVERROR(ENOMEM);
+            new_page->page = (OGGPage) {
+                .start_granule = p->start_granule,
+                .granule = -1,
+                .stream_index = p->stream_index,
+                .flags = started,
+                .segments_count = 0,
+                .size = 0,
+            };
+            new_page->next = (*page_list)->next;
+            (*page_list)->next = new_page;
+            (*page_list) = new_page;
+            p = &new_page->page;
+            ((OGGStreamContext *) s->streams[p->stream_index]->priv_data)->page_count++;
+        }
+
+        seg_size = FFMIN(255, size);
+        memcpy(p->data + p->size, data, seg_size);
+        p->size += seg_size;
+        p->segments[p->segments_count++] = seg_size;
+        data += seg_size;
+        size -= seg_size;
+        started = 1;
+    }
+
+    p->granule = granule;
     return 0;
 }
 
 static uint8_t *ogg_write_vorbiscomment(int64_t offset, int bitexact,
                                         int *header_len, AVDictionary **m, int framing_bit,
-                                        AVChapter **chapters, unsigned int nb_chapters)
+                                        AVChapter **chapters, unsigned int nb_chapters,
+                                        unsigned int nb_attach)
 {
     const char *vendor = bitexact ? "ffmpeg" : LIBAVFORMAT_IDENT;
     AVIOContext pb;
-    int64_t size;
+    int64_t comment_size, size;
     uint8_t *p;
 
     ff_metadata_conv(m, ff_vorbiscomment_metadata_conv, NULL);
 
-    size = offset + ff_vorbiscomment_length(*m, vendor, chapters, nb_chapters) + framing_bit;
+    comment_size = ff_vorbiscomment_length(*m, vendor, chapters, nb_chapters);
+    size = offset + comment_size + framing_bit;
     if (size > INT_MAX)
         return NULL;
     p = av_mallocz(size);
@@ -290,7 +356,7 @@  static uint8_t *ogg_write_vorbiscomment(int64_t offset, int bitexact,
         return NULL;
 
     ffio_init_context(&pb, p + offset, size - offset, 1, NULL, NULL, NULL, NULL);
-    ff_vorbiscomment_write(&pb, *m, vendor, chapters, nb_chapters);
+    ff_vorbiscomment_write(&pb, *m, vendor, chapters, nb_chapters, nb_attach);
     if (framing_bit)
         avio_w8(&pb, 1);
 
@@ -317,18 +383,21 @@  static int ogg_build_flac_headers(AVCodecParameters *par,
     bytestream_put_buffer(&p, "FLAC", 4);
     bytestream_put_byte(&p, 1); // major version
     bytestream_put_byte(&p, 0); // minor version
-    bytestream_put_be16(&p, 1); // headers packets without this one
+    bytestream_put_be16(&p, 1 + oggstream->attach_count); // headers packets without this one
     bytestream_put_buffer(&p, "fLaC", 4);
     bytestream_put_byte(&p, 0x00); // streaminfo
     bytestream_put_be24(&p, 34);
     bytestream_put_buffer(&p, par->extradata, FLAC_STREAMINFO_SIZE);
 
     // second packet: VorbisComment
-    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1], m, 0, NULL, 0);
+    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1], m, 0, NULL, 0, 0);
     if (!p)
         return AVERROR(ENOMEM);
     oggstream->header[1] = p;
-    bytestream_put_byte(&p, 0x84); // last metadata block and vorbis comment
+    if (!oggstream->attach_count)
+        bytestream_put_byte(&p, 0x84); // last metadata block and vorbis comment
+    else
+        bytestream_put_byte(&p, 0x04); // vorbis comment followed by attachments
     bytestream_put_be24(&p, oggstream->header_len[1] - 4);
 
     return 0;
@@ -355,7 +424,8 @@  static int ogg_build_speex_headers(AVCodecParameters *par,
     AV_WL32(&oggstream->header[0][68], 0);  // set extra_headers to 0
 
     // second packet: VorbisComment
-    p = ogg_write_vorbiscomment(0, bitexact, &oggstream->header_len[1], m, 0, NULL, 0);
+    p = ogg_write_vorbiscomment(0, bitexact, &oggstream->header_len[1], m, 0,
+                                NULL, 0, oggstream->attach_count);
     if (!p)
         return AVERROR(ENOMEM);
     oggstream->header[1] = p;
@@ -384,7 +454,8 @@  static int ogg_build_opus_headers(AVCodecParameters *par,
     bytestream_put_buffer(&p, par->extradata, par->extradata_size);
 
     /* second packet: VorbisComment */
-    p = ogg_write_vorbiscomment(8, bitexact, &oggstream->header_len[1], m, 0, chapters, nb_chapters);
+    p = ogg_write_vorbiscomment(8, bitexact, &oggstream->header_len[1], m, 0,
+                                chapters, nb_chapters, oggstream->attach_count);
     if (!p)
         return AVERROR(ENOMEM);
     oggstream->header[1] = p;
@@ -428,8 +499,9 @@  static int ogg_build_vp8_headers(AVFormatContext *s, AVStream *st,
     bytestream_put_be32(&p, st->time_base.num);
 
     /* optional second packet: VorbisComment */
-    if (av_dict_get(st->metadata, "", NULL, AV_DICT_IGNORE_SUFFIX)) {
-        p = ogg_write_vorbiscomment(7, bitexact, &oggstream->header_len[1], &st->metadata, 0, NULL, 0);
+    if (av_dict_get(st->metadata, "", NULL, AV_DICT_IGNORE_SUFFIX) || oggstream->attach_count) {
+        p = ogg_write_vorbiscomment(7, bitexact, &oggstream->header_len[1], &st->metadata, 0,
+                                    NULL, 0, oggstream->attach_count);
         if (!p)
             return AVERROR(ENOMEM);
         oggstream->header[1] = p;
@@ -470,22 +542,18 @@  static int ogg_init(AVFormatContext *s)
 {
     OGGContext *ogg = s->priv_data;
     OGGStreamContext *oggstream = NULL;
-    int i, j;
+    OGGAttachContext *attachstream = NULL;
+    AVStream *st;
+    int i, j, k = 0;
 
     if (ogg->pref_size)
         av_log(s, AV_LOG_WARNING, "The pagesize option is deprecated\n");
 
+    // init stream priv_data
     for (i = 0; i < s->nb_streams; i++) {
-        AVStream *st = s->streams[i];
-        unsigned serial_num = i + ogg->serial_offset;
-
-        if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
-            if (st->codecpar->codec_id == AV_CODEC_ID_OPUS)
-                /* Opus requires a fixed 48kHz clock */
-                avpriv_set_pts_info(st, 64, 1, 48000);
-            else
-                avpriv_set_pts_info(st, 64, 1, st->codecpar->sample_rate);
-        }
+        st = s->streams[i];
+        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
 
         if (st->codecpar->codec_id != AV_CODEC_ID_VORBIS &&
             st->codecpar->codec_id != AV_CODEC_ID_THEORA &&
@@ -497,14 +565,82 @@  static int ogg_init(AVFormatContext *s)
             return AVERROR(EINVAL);
         }
 
+        oggstream = av_mallocz(sizeof(*oggstream));
+        if (!oggstream)
+            return AVERROR(ENOMEM);
+        st->priv_data = oggstream;
+    }
+
+    // setup attachment association
+    for (i = 0; i < s->nb_streams; i++) {
+        st = s->streams[i];
+        if (!(st->disposition & AV_DISPOSITION_ATTACHED_PIC))
+            continue;
+        ogg->attach_count++;
+
+        if (st->codecpar->codec_id != AV_CODEC_ID_GIF &&
+            st->codecpar->codec_id != AV_CODEC_ID_MJPEG &&
+            st->codecpar->codec_id != AV_CODEC_ID_PNG &&
+            st->codecpar->codec_id != AV_CODEC_ID_TIFF &&
+            st->codecpar->codec_id != AV_CODEC_ID_BMP) {
+            av_log(s, AV_LOG_ERROR, "Unsupported attachment codec id in stream %d\n", i);
+            return AVERROR(EINVAL);
+        }
+
+        attachstream = av_mallocz(sizeof(*attachstream));
+        if (!attachstream)
+            return AVERROR(ENOMEM);
+        st->priv_data = attachstream;
+
+        int attached = 0;
+        AVDictionaryEntry *e = av_dict_get(st->metadata, "ogg_attach", NULL, 0);
+        if (e) {
+            int ogg_attach = -1;
+            int res = sscanf(e->value, "%d", &ogg_attach);
+            if (res > 0 && ogg_attach >= 0 && ogg_attach < s->nb_streams &&
+                !(s->streams[ogg_attach]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
+                attachstream->attach_to = ogg_attach;
+                oggstream = s->streams[ogg_attach]->priv_data;
+                oggstream->attach_count++;
+                attached = 1;
+            }
+        }
+
+        for (j = 0; !attached && j < s->nb_streams; ++j) {
+            if (!(s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
+                attachstream->attach_to = j;
+                oggstream = s->streams[j]->priv_data;
+                oggstream->attach_count++;
+                attached = 1;
+            }
+        }
+        if (!attached) {
+            av_log(s, AV_LOG_ERROR, "Cannot add attachment %d without an audio/video stream\n", i);
+            return AVERROR(EINVAL);
+        }
+    }
+
+    for (i = 0; i < s->nb_streams; i++) {
+        st = s->streams[i];
+        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
+
+        unsigned serial_num = k++ + ogg->serial_offset;
+
+        if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
+            if (st->codecpar->codec_id == AV_CODEC_ID_OPUS)
+                /* Opus requires a fixed 48kHz clock */
+                avpriv_set_pts_info(st, 64, 1, 48000);
+            else
+                avpriv_set_pts_info(st, 64, 1, st->codecpar->sample_rate);
+        }
+
         if ((!st->codecpar->extradata || !st->codecpar->extradata_size) &&
             st->codecpar->codec_id != AV_CODEC_ID_VP8) {
             av_log(s, AV_LOG_ERROR, "No extradata present\n");
             return AVERROR_INVALIDDATA;
         }
-        oggstream = av_mallocz(sizeof(*oggstream));
-        if (!oggstream)
-            return AVERROR(ENOMEM);
+        oggstream = st->priv_data;
 
         oggstream->page.stream_index = i;
 
@@ -521,7 +657,6 @@  static int ogg_init(AVFormatContext *s)
 
         av_dict_copy(&st->metadata, s->metadata, AV_DICT_DONT_OVERWRITE);
 
-        st->priv_data = oggstream;
         if (st->codecpar->codec_id == AV_CODEC_ID_FLAC) {
             int err = ogg_build_flac_headers(st->codecpar, oggstream,
                                              s->flags & AVFMT_FLAG_BITEXACT,
@@ -557,7 +692,8 @@  static int ogg_init(AVFormatContext *s)
             uint8_t *p;
             const char *cstr = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? "vorbis" : "theora";
             int header_type = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 3 : 0x81;
-            int framing_bit = st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 1 : 0;
+            int framing_bit = oggstream->attach_count > 0 ? 0 : 
+                              st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 1 : 0;
 
             if (avpriv_split_xiph_headers(st->codecpar->extradata, st->codecpar->extradata_size,
                                       st->codecpar->codec_id == AV_CODEC_ID_VORBIS ? 30 : 42,
@@ -569,7 +705,7 @@  static int ogg_init(AVFormatContext *s)
 
             p = ogg_write_vorbiscomment(7, s->flags & AVFMT_FLAG_BITEXACT,
                                         &oggstream->header_len[1], &st->metadata,
-                                        framing_bit, NULL, 0);
+                                        framing_bit, NULL, 0, oggstream->attach_count);
             oggstream->header[1] = p;
             if (!p)
                 return AVERROR(ENOMEM);
@@ -601,36 +737,129 @@  static int ogg_init(AVFormatContext *s)
 
 static int ogg_write_header(AVFormatContext *s)
 {
+    OGGContext *ogg = s->priv_data;
     OGGStreamContext *oggstream = NULL;
-    int i, j;
+    int j;
 
     for (j = 0; j < s->nb_streams; j++) {
+        if (s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
         oggstream = s->streams[j]->priv_data;
         ogg_buffer_data(s, s->streams[j], oggstream->header[0],
                         oggstream->header_len[0], 0, 1);
         oggstream->page.flags |= 2; // bos
-        ogg_buffer_page(s, oggstream);
+        ogg_buffer_page(s, oggstream, NULL);
     }
+    ogg_write_pages(s, 2);
+
     for (j = 0; j < s->nb_streams; j++) {
+        if (s->streams[j]->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
         AVStream *st = s->streams[j];
         oggstream = st->priv_data;
-        for (i = 1; i < 3; i++) {
-            if (oggstream->header_len[i])
-                ogg_buffer_data(s, st, oggstream->header[i],
-                                oggstream->header_len[i], 0, 1);
+        if (oggstream->header_len[1]) {
+            ogg_buffer_data(s, st, oggstream->header[1],
+                            oggstream->header_len[1], 0, 1);
+            ogg_buffer_page(s, oggstream, &oggstream->comments);
+        }
+        if (oggstream->header_len[2]) {
+            ogg_buffer_data(s, st, oggstream->header[2],
+                            oggstream->header_len[2], 0, 1);
+            ogg_buffer_page(s, oggstream, NULL);
         }
-        ogg_buffer_page(s, oggstream);
     }
 
     oggstream->page.start_granule = AV_NOPTS_VALUE;
 
-    ogg_write_pages(s, 2);
+    if (!ogg->attach_count)
+        ogg_write_pages(s, 2);
+
+    return 0;
+}
+
+static int ogg_write_attached_packet_internal(AVFormatContext *s, AVPacket *pkt)
+{
+    OGGContext *ogg = s->priv_data;
+    AVStream *st = s->streams[pkt->stream_index];
+    OGGAttachContext *attachstream = st->priv_data;
+    AVStream *base_st = s->streams[attachstream->attach_to];
+    OGGStreamContext *oggstream = base_st->priv_data;
+    uint8_t *data;
+    int data_len;
+    int ret;
+
+    attachstream->count++;
+    // Only use the first packet received for each attached stream
+    if (attachstream->count == 2)
+        av_log(s, AV_LOG_WARNING, "Attached stream %d has too many packets, ignoring\n",
+               pkt->stream_index);
+    if (attachstream->count > 1)
+        return 0;
+
+    if (!ogg->attach_count || !oggstream->attach_count) {
+        av_log(s, AV_LOG_ERROR, "More attached streams than expected\n");
+        return -1;
+    }
+
+    AVIOContext *pb;
+    avio_open_dyn_buf(&pb);
+    if (!pb)
+        return AVERROR(ENOMEM);
 
+    if (base_st->codecpar->codec_id == AV_CODEC_ID_FLAC) {
+        int picture_len = ff_flac_picture_length(s, pkt);
+        if (picture_len >= 1<<24) {
+            av_log(s, AV_LOG_ERROR, "Picture block too big %d >= %d\n", picture_len, 1<<24);
+            return AVERROR(EINVAL);
+        }
+        if (oggstream->attach_count == 1)
+            avio_w8(pb, 0x86); // last attachment
+        else
+            avio_w8(pb, 0x06);
+        avio_wb24(pb, picture_len);
+        ff_flac_write_picture(s, pb, pkt, &oggstream->attached_types);
+        data_len = avio_close_dyn_buf(pb, &data);
+        if (!data || data_len < 0)
+            return AVERROR(ENOMEM);
+    } else {
+        uint8_t *bin_data;
+        int bin_data_len;
+        ff_flac_write_picture(s, pb, pkt, &oggstream->attached_types);
+        bin_data_len = avio_close_dyn_buf(pb, &bin_data);
+        if (!bin_data || bin_data_len < 0)
+            return AVERROR(ENOMEM);
+
+        data_len = 4 + 23 + AV_BASE64_SIZE(bin_data_len) - 1;
+        data = av_malloc(data_len + 1);
+        if (!data)
+            return AVERROR(ENOMEM);
+        AV_WL32(data, data_len - 4);
+        strcpy(data + 4, "METADATA_BLOCK_PICTURE=");
+        av_base64_encode(data + 4 + 23, data_len - 4 - 23 + 1, bin_data, bin_data_len);
+        av_free(bin_data);
+
+        // add framing bit when required
+        if (oggstream->attach_count == 1 &&
+            base_st->codecpar->codec_id == AV_CODEC_ID_VORBIS)
+            data[data_len++] = 1;
+    }
+
+    ret = ogg_append_page(s, &oggstream->comments, data, data_len,
+                          base_st->codecpar->codec_id == AV_CODEC_ID_FLAC);
+    av_free(data);
+    if (ret < 0)
+        return ret;
+
+    oggstream->attach_count--;
+    ogg->attach_count--;
+    if (!ogg->attach_count)
+        ogg_write_pages(s, 0);
     return 0;
 }
 
 static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
 {
+    OGGContext *ogg = s->priv_data;
     AVStream *st = s->streams[pkt->stream_index];
     OGGStreamContext *oggstream = st->priv_data;
     int ret;
@@ -674,7 +903,9 @@  static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
     if (ret < 0)
         return ret;
 
-    ogg_write_pages(s, 0);
+    if (!ogg->attach_count) {
+        ogg_write_pages(s, 0);
+    }
 
     oggstream->last_granule = granule;
 
@@ -683,15 +914,26 @@  static int ogg_write_packet_internal(AVFormatContext *s, AVPacket *pkt)
 
 static int ogg_write_packet(AVFormatContext *s, AVPacket *pkt)
 {
+    OGGContext *ogg = s->priv_data;
     int i;
 
-    if (pkt)
-        return ogg_write_packet_internal(s, pkt);
+    if (pkt) {
+        AVStream *st = s->streams[pkt->stream_index];
+        if (st->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            return ogg_write_attached_packet_internal(s, pkt);
+        else
+            return ogg_write_packet_internal(s, pkt);
+    }
+
+    if (ogg->attach_count)
+        return 0;
 
     for (i = 0; i < s->nb_streams; i++) {
+        if (s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
         OGGStreamContext *oggstream = s->streams[i]->priv_data;
         if (oggstream->page.segments_count)
-            ogg_buffer_page(s, oggstream);
+            ogg_buffer_page(s, oggstream, NULL);
     }
 
     ogg_write_pages(s, 2);
@@ -700,14 +942,20 @@  static int ogg_write_packet(AVFormatContext *s, AVPacket *pkt)
 
 static int ogg_write_trailer(AVFormatContext *s)
 {
+    OGGContext *ogg = s->priv_data;
     int i;
 
+    if (ogg->attach_count)
+        av_log(s, AV_LOG_WARNING, "Some attached images were never written\n");
+
     /* flush current page if needed */
     for (i = 0; i < s->nb_streams; i++) {
+        if (s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)
+            continue;
         OGGStreamContext *oggstream = s->streams[i]->priv_data;
 
         if (oggstream->page.size > 0)
-            ogg_buffer_page(s, oggstream);
+            ogg_buffer_page(s, oggstream, NULL);
     }
 
     ogg_write_pages(s, 1);
@@ -723,16 +971,18 @@  static void ogg_free(AVFormatContext *s)
 
     for (i = 0; i < s->nb_streams; i++) {
         AVStream *st = s->streams[i];
-        OGGStreamContext *oggstream = st->priv_data;
-        if (!oggstream)
-            continue;
-        if (st->codecpar->codec_id == AV_CODEC_ID_FLAC ||
-            st->codecpar->codec_id == AV_CODEC_ID_SPEEX ||
-            st->codecpar->codec_id == AV_CODEC_ID_OPUS ||
-            st->codecpar->codec_id == AV_CODEC_ID_VP8) {
-            av_freep(&oggstream->header[0]);
-        }
-        av_freep(&oggstream->header[1]);
+        if (!(s->streams[i]->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
+            OGGStreamContext *oggstream = st->priv_data;
+            if (!oggstream)
+                continue;
+            if (st->codecpar->codec_id == AV_CODEC_ID_FLAC ||
+                st->codecpar->codec_id == AV_CODEC_ID_SPEEX ||
+                st->codecpar->codec_id == AV_CODEC_ID_OPUS ||
+                st->codecpar->codec_id == AV_CODEC_ID_VP8) {
+                av_freep(&oggstream->header[0]);
+            }
+            av_freep(&oggstream->header[1]);
+         }
     }
 
     while (p) {
@@ -743,6 +993,28 @@  static void ogg_free(AVFormatContext *s)
     ogg->page_list = NULL;
 }
 
+static int ogg_query_codec(enum AVCodecID id, int std_compliance)
+{
+    switch (id) {
+        case AV_CODEC_ID_VORBIS:
+        case AV_CODEC_ID_THEORA:
+        case AV_CODEC_ID_SPEEX:
+        case AV_CODEC_ID_FLAC:
+        case AV_CODEC_ID_OPUS:
+        case AV_CODEC_ID_VP8:
+            return 1;
+        case AV_CODEC_ID_GIF:
+        case AV_CODEC_ID_MJPEG:
+        case AV_CODEC_ID_PNG:
+        case AV_CODEC_ID_TIFF:
+        case AV_CODEC_ID_BMP:
+            return MKTAG('A', 'P', 'I', 'C');
+        default:
+            return 0;
+    }
+    return 0;
+}
+
 #if CONFIG_OGG_MUXER
 OGG_CLASS(ogg, Ogg)
 AVOutputFormat ff_ogg_muxer = {
@@ -771,6 +1043,7 @@  AVOutputFormat ff_ogg_muxer = {
     .deinit            = ogg_free,
     .flags             = AVFMT_TS_NEGATIVE | AVFMT_TS_NONSTRICT | AVFMT_ALLOW_FLUSH,
     .priv_class        = &ogg_muxer_class,
+    .query_codec       = ogg_query_codec,
 };
 #endif
 
@@ -783,6 +1056,7 @@  AVOutputFormat ff_oga_muxer = {
     .extensions        = "oga",
     .priv_data_size    = sizeof(OGGContext),
     .audio_codec       = AV_CODEC_ID_FLAC,
+    .video_codec       = AV_CODEC_ID_MJPEG,
     .init              = ogg_init,
     .write_header      = ogg_write_header,
     .write_packet      = ogg_write_packet,
@@ -790,6 +1064,7 @@  AVOutputFormat ff_oga_muxer = {
     .deinit            = ogg_free,
     .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
     .priv_class        = &oga_muxer_class,
+    .query_codec       = ogg_query_codec,
 };
 #endif
 
@@ -812,6 +1087,7 @@  AVOutputFormat ff_ogv_muxer = {
     .deinit            = ogg_free,
     .flags             = AVFMT_TS_NEGATIVE | AVFMT_TS_NONSTRICT | AVFMT_ALLOW_FLUSH,
     .priv_class        = &ogv_muxer_class,
+    .query_codec       = ogg_query_codec,
 };
 #endif
 
@@ -824,6 +1100,7 @@  AVOutputFormat ff_spx_muxer = {
     .extensions        = "spx",
     .priv_data_size    = sizeof(OGGContext),
     .audio_codec       = AV_CODEC_ID_SPEEX,
+    .video_codec       = AV_CODEC_ID_MJPEG,
     .init              = ogg_init,
     .write_header      = ogg_write_header,
     .write_packet      = ogg_write_packet,
@@ -831,6 +1108,7 @@  AVOutputFormat ff_spx_muxer = {
     .deinit            = ogg_free,
     .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
     .priv_class        = &spx_muxer_class,
+    .query_codec       = ogg_query_codec,
 };
 #endif
 
@@ -843,6 +1121,7 @@  AVOutputFormat ff_opus_muxer = {
     .extensions        = "opus",
     .priv_data_size    = sizeof(OGGContext),
     .audio_codec       = AV_CODEC_ID_OPUS,
+    .video_codec       = AV_CODEC_ID_MJPEG,
     .init              = ogg_init,
     .write_header      = ogg_write_header,
     .write_packet      = ogg_write_packet,
@@ -850,5 +1129,6 @@  AVOutputFormat ff_opus_muxer = {
     .deinit            = ogg_free,
     .flags             = AVFMT_TS_NEGATIVE | AVFMT_ALLOW_FLUSH,
     .priv_class        = &opus_muxer_class,
+    .query_codec       = ogg_query_codec,
 };
 #endif
diff --git a/libavformat/oggparseflac.c b/libavformat/oggparseflac.c
index 4e85b05c67..840d1b73ef 100644
--- a/libavformat/oggparseflac.c
+++ b/libavformat/oggparseflac.c
@@ -22,6 +22,7 @@ 
 #include "libavcodec/get_bits.h"
 #include "libavcodec/flac.h"
 #include "avformat.h"
+#include "flac_picture.h"
 #include "internal.h"
 #include "oggdec.h"
 
@@ -72,6 +73,9 @@  flac_header (AVFormatContext * s, int idx)
         avpriv_set_pts_info(st, 64, 1, samplerate);
     } else if (mdt == FLAC_METADATA_TYPE_VORBIS_COMMENT) {
         ff_vorbis_stream_comment(s, st, os->buf + os->pstart + 4, os->psize - 4);
+    } else if (mdt == FLAC_METADATA_TYPE_PICTURE) {
+        int ret = ff_flac_parse_picture(s, os->buf + os->pstart + 4, os->psize - 4, idx, 1);
+        if (ret < 0) return ret;
     }
 
     return 1;
diff --git a/libavformat/oggparsevorbis.c b/libavformat/oggparsevorbis.c
index 0e8c25c030..c400fe8587 100644
--- a/libavformat/oggparsevorbis.c
+++ b/libavformat/oggparsevorbis.c
@@ -75,7 +75,7 @@  static int ogm_chapter(AVFormatContext *as, uint8_t *key, uint8_t *val)
 int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
                              const uint8_t *buf, int size)
 {
-    int updates = ff_vorbis_comment(as, &st->metadata, buf, size, 1);
+    int updates = ff_vorbis_comment(as, &st->metadata, buf, size, st->index);
 
     if (updates > 0) {
         st->event_flags |= AVSTREAM_EVENT_FLAG_METADATA_UPDATED;
@@ -86,7 +86,7 @@  int ff_vorbis_stream_comment(AVFormatContext *as, AVStream *st,
 
 int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
                       const uint8_t *buf, int size,
-                      int parse_picture)
+                      int parse_picture_index)
 {
     const uint8_t *p   = buf;
     const uint8_t *end = buf + size;
@@ -151,7 +151,7 @@  int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
              * 'METADATA_BLOCK_PICTURE'. This is the preferred and
              * recommended way of embedding cover art within VorbisComments."
              */
-            if (!av_strcasecmp(tt, "METADATA_BLOCK_PICTURE") && parse_picture) {
+            if (!av_strcasecmp(tt, "METADATA_BLOCK_PICTURE") && parse_picture_index > -2) {
                 int ret, len = AV_BASE64_DECODE_SIZE(vl);
                 char *pict = av_malloc(len);
 
@@ -165,7 +165,7 @@  int ff_vorbis_comment(AVFormatContext *as, AVDictionary **m,
                 av_freep(&tt);
                 av_freep(&ct);
                 if (ret > 0)
-                    ret = ff_flac_parse_picture(as, pict, ret, 0);
+                    ret = ff_flac_parse_picture(as, pict, ret, parse_picture_index, 0);
                 av_freep(&pict);
                 if (ret < 0) {
                     av_log(as, AV_LOG_WARNING, "Failed to parse cover art block.\n");
diff --git a/libavformat/vorbiscomment.c b/libavformat/vorbiscomment.c
index a929634cc0..89127236fa 100644
--- a/libavformat/vorbiscomment.c
+++ b/libavformat/vorbiscomment.c
@@ -21,6 +21,7 @@ 
 
 #include "avio.h"
 #include "avformat.h"
+#include "flac_picture.h"
 #include "metadata.h"
 #include "vorbiscomment.h"
 #include "libavutil/dict.h"
@@ -43,6 +44,12 @@  int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
 {
     int64_t len = 8;
     len += strlen(vendor_string);
+    if (m) {
+        AVDictionaryEntry *tag = NULL;
+        while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
+            len += 4 +strlen(tag->key) + 1 + strlen(tag->value);
+        }
+    }
     if (chapters && nb_chapters) {
         for (int i = 0; i < nb_chapters; i++) {
             AVDictionaryEntry *tag = NULL;
@@ -53,18 +60,12 @@  int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
             }
         }
     }
-    if (m) {
-        AVDictionaryEntry *tag = NULL;
-        while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
-            len += 4 +strlen(tag->key) + 1 + strlen(tag->value);
-        }
-    }
     return len;
 }
 
-int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
-                           const char *vendor_string,
-                           AVChapter **chapters, unsigned int nb_chapters)
+int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m, const char *vendor_string,
+                           AVChapter **chapters, unsigned int nb_chapters,
+                           int external_count)
 {
     int cm_count = 0;
     avio_wl32(pb, strlen(vendor_string));
@@ -74,10 +75,11 @@  int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
             cm_count += av_dict_count(chapters[i]->metadata) + 1;
         }
     }
+    int count = av_dict_count(m) + cm_count + external_count;
+    avio_wl32(pb, count);
+
     if (m) {
-        int count = av_dict_count(m) + cm_count;
         AVDictionaryEntry *tag = NULL;
-        avio_wl32(pb, count);
         while ((tag = av_dict_get(m, "", tag, AV_DICT_IGNORE_SUFFIX))) {
             int64_t len1 = strlen(tag->key);
             int64_t len2 = strlen(tag->value);
@@ -88,6 +90,8 @@  int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
             avio_w8(pb, '=');
             avio_write(pb, tag->value, len2);
         }
+    }
+    if (chapters && nb_chapters) {
         for (int i = 0; i < nb_chapters; i++) {
             AVChapter *chp = chapters[i];
             char chapter_time[13];
@@ -107,7 +111,7 @@  int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
             avio_w8(pb, '=');
             avio_write(pb, chapter_time, 12);
 
-            tag = NULL;
+            AVDictionaryEntry *tag = NULL;
             while ((tag = av_dict_get(chapters[i]->metadata, "", tag, AV_DICT_IGNORE_SUFFIX))) {
                 int64_t len1 = !strcmp(tag->key, "title") ? 4 : strlen(tag->key);
                 int64_t len2 = strlen(tag->value);
@@ -124,7 +128,7 @@  int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
                 avio_write(pb, tag->value, len2);
             }
         }
-    } else
-        avio_wl32(pb, 0);
+    }
+
     return 0;
 }
diff --git a/libavformat/vorbiscomment.h b/libavformat/vorbiscomment.h
index 7cacd0b2a0..1dc9ed1da0 100644
--- a/libavformat/vorbiscomment.h
+++ b/libavformat/vorbiscomment.h
@@ -48,9 +48,9 @@  int64_t ff_vorbiscomment_length(const AVDictionary *m, const char *vendor_string
  * @param chapters The chapters to write.
  * @param nb_chapters The number of chapters to write.
  */
-int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m,
-                           const char *vendor_string,
-                           AVChapter **chapters, unsigned int nb_chapters);
+int ff_vorbiscomment_write(AVIOContext *pb, const AVDictionary *m, const char *vendor_string,
+                           AVChapter **chapters, unsigned int nb_chapters,
+                           int external_count);
 
 extern const AVMetadataConv ff_vorbiscomment_metadata_conv[];