diff mbox series

[FFmpeg-devel] libavutil: add clean aperture (CLAP) side data.

Message ID 20200427042753.166716-1-neil.birkbeck@gmail.com
State New
Headers show
Series [FFmpeg-devel] libavutil: add clean aperture (CLAP) side data.
Related show

Checks

Context Check Description
andriy/default pending
andriy/make success Make finished
andriy/make_fate success Make fate finished

Commit Message

Neil Birkbeck April 27, 2020, 4:27 a.m. UTC
The clean aperature represents a cropping of the stored image data used to
relate the image data to a canonical video system and exists as container
metadata (see 'clap' section in https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html)

Addition of the side data is a first step towards demuxing CLAP atom metadata,
helping to resolve https://trac.ffmpeg.org/ticket/7437

This CleanAperture representation can also carry PixelCrop fields from MKV.
Side data was suggested as a way to carry such PixelCrop fields in:
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192302.html

Transmuxing the side data can then be added (MOV to/from MKV), and
auto-application could optionally be enabled like autorotate in ffmpeg_filter.c.

Signed-off-by: Neil Birkbeck <neil.birkbeck@gmail.com>
---
 libavcodec/avpacket.c      |  1 +
 libavcodec/packet.h        | 12 ++++++++
 libavformat/dump.c         | 15 +++++++++
 libavutil/Makefile         |  2 ++
 libavutil/clean_aperture.c | 30 ++++++++++++++++++
 libavutil/clean_aperture.h | 63 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 123 insertions(+)
 create mode 100644 libavutil/clean_aperture.c
 create mode 100644 libavutil/clean_aperture.h

Comments

Lynne April 27, 2020, 11:38 a.m. UTC | #1
Apr 27, 2020, 05:27 by neil.birkbeck@gmail.com:

> The clean aperature represents a cropping of the stored image data used to
> relate the image data to a canonical video system and exists as container
> metadata (see 'clap' section in https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html)
>
> Addition of the side data is a first step towards demuxing CLAP atom metadata,
> helping to resolve https://trac.ffmpeg.org/ticket/7437
>
> This CleanAperture representation can also carry PixelCrop fields from MKV.
> Side data was suggested as a way to carry such PixelCrop fields in:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192302.html
>
> Transmuxing the side data can then be added (MOV to/from MKV), and
> auto-application could optionally be enabled like autorotate in ffmpeg_filter.c.
>

We already have frame cropping metadata in the AVFrame structure.
Neil Birkbeck April 28, 2020, 9:05 a.m. UTC | #2
On Mon, Apr 27, 2020 at 4:38 AM Lynne <dev@lynne.ee> wrote:

> Apr 27, 2020, 05:27 by neil.birkbeck@gmail.com:
>
> > The clean aperature represents a cropping of the stored image data used
> to
> > relate the image data to a canonical video system and exists as container
> > metadata (see 'clap' section in
> https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html
> )
> >
> > Addition of the side data is a first step towards demuxing CLAP atom
> metadata,
> > helping to resolve https://trac.ffmpeg.org/ticket/7437
> >
> > This CleanAperture representation can also carry PixelCrop fields from
> MKV.
> > Side data was suggested as a way to carry such PixelCrop fields in:
> > https://ffmpeg.org/pipermail/ffmpeg-devel/2016-March/192302.html
> >
> > Transmuxing the side data can then be added (MOV to/from MKV), and
> > auto-application could optionally be enabled like autorotate in
> ffmpeg_filter.c.
> >
>
> We already have frame cropping metadata in the AVFrame structure.
>

Propagating the crop fields from the format/container to AVFrame was my
initial thought. As far as I can tell, currently the cropping fields on
AVFrame are typically set from the coded bitstream. And plumbing from the
container requires the crop fields to be added to AVCodecParameters and to
AVCodecContext, which could then be set on the AVFrame in decode.c.

In part, I based the SideData on some older threads on the topic, but those
threads predate the AVFrame-level cropping. I still see a few a couple of
other potential reasons:
-For the transmux use cases, I was concerned adding the extra fields within
AVCodecParameters/AVCodecContext may confuse the interfaces as to whether
crop/CLAP was destined for the container or the codec. As stream metadata,
it seemed more clear that it should belong in the container
-And a minor one: for the CLAP atom case, on remuxing, the existing integer
crop fields could cause some small sub-pixel loss.

Is it preferred to add the extra fields in AVCodecParameters/AVCodecContext
and plumb them into the frame instead? That should work for the case that
I'm trying to resolve (applying PixelCrop from MKV), but wanted to make
sure the other use cases (e.g., transmux, CLAP atom) worked too.
Kieran O Leary April 28, 2020, 10:05 a.m. UTC | #3
Hi Neil,

Thanks so much for working on this - what was the impetus?
I started the ticket you mentioned. I applied your patch and tested it with
the clap.mov file from that ticket. How do I know if behaviour has changed
with this patch, how should I be testing?
I don't see any reference to the clap atom values when transcoding the file
to MKV or to another mov:

./ffmpeg -i /mnt/c/Users/kieran.oleary/Downloads/clap.mov out.mkv
ffmpeg version N-97506-g2fae000994 Copyright (c) 2000-2020 the FFmpeg
developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration:
  libavutil      56. 43.100 / 56. 43.100
  libavcodec     58. 82.100 / 58. 82.100
  libavformat    58. 42.101 / 58. 42.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 79.100 /  7. 79.100
  libswscale      5.  6.101 /  5.  6.101
  libswresample   3.  6.100 /  3.  6.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/mnt/c/Users/kieran.oleary/Downloads/clap.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    creation_time   : 2018-09-13T08:48:41.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 80686 kb/s
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276),
yuv422p10le(smpte170m/bt470bg/bt709, top coded first (swapped)), 720x576,
79626 kb/s, SAR 59:54 DAR 295:216, 9 fps, 25 tbr, 25k tbn, 25k tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      encoder         : Uncompressed 10-Bit YUV
      timecode        : 00:00:00:00
    Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz,
stereo, s32 (24 bit), 2304 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : Time Code Media Handler
      reel_name       : 001
      timecode        : 00:00:00:00
File 'out.mkv' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (v210 (native) -> mpeg4 (native))
  Stream #0:1 -> #0:1 (pcm_s24le (native) -> ac3 (native))
Press [q] to stop, [?] for help
Output #0, matroska, to 'out.mkv':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    encoder         : Lavf58.42.101
    Stream #0:0(eng): Video: mpeg4 (FMP4 / 0x34504D46), yuv420p(top coded
first (swapped)), 720x576 [SAR 59:54 DAR 295:216], q=2-31, 200 kb/s, 25
fps, 1k tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 mpeg4
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
    Stream #0:1(eng): Audio: ac3 ([0] [0][0] / 0x2000), 48000 Hz, stereo,
fltp (24 bit), 192 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 ac3
frame=    9 fps=0.0 q=1.6 Lsize=      36kB time=00:00:00.41 bitrate=
717.1kbits/s speed=9.05x
video:25kB audio:10kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 3.596637%
Andreas Rheinhardt April 28, 2020, 10:08 a.m. UTC | #4
Kieran O Leary:
> Hi Neil,
> 
> Thanks so much for working on this - what was the impetus?
> I started the ticket you mentioned. I applied your patch and tested it with
> the clap.mov file from that ticket. How do I know if behaviour has changed
> with this patch, how should I be testing?
> I don't see any reference to the clap atom values when transcoding the file
> to MKV or to another mov:
> 
That's expected. The patch provided only provides the structure in which
the values are intended to be exported; it does not add any demuxing or
muxing capabilities for mov or mkv (as you can see from the fact that
none of these (de)muxers have been changed in the patch).

- Andreas
Nicolas George April 28, 2020, 10:18 a.m. UTC | #5
Andreas Rheinhardt (12020-04-28):
> That's expected. The patch provided only provides the structure in which
> the values are intended to be exported; it does not add any demuxing or
> muxing capabilities for mov or mkv (as you can see from the fact that
> none of these (de)muxers have been changed in the patch).

Which is something I intended to comment on: adding code without users
is rarely a good idea. I suggest we do not commit until at least one
demuxer use it, preferably at least two. Otherwise, we may realize that
“oh crap, it doesn't work” because of a tiny unforeseen detail.

Regards,
Neil Birkbeck April 28, 2020, 5:23 p.m. UTC | #6
On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:

> Andreas Rheinhardt (12020-04-28):
> > That's expected. The patch provided only provides the structure in which
> > the values are intended to be exported; it does not add any demuxing or
> > muxing capabilities for mov or mkv (as you can see from the fact that
> > none of these (de)muxers have been changed in the patch).
>
> Which is something I intended to comment on: adding code without users
> is rarely a good idea. I suggest we do not commit until at least one
> demuxer use it, preferably at least two. Otherwise, we may realize that
> “oh crap, it doesn't work” because of a tiny unforeseen detail.


Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
mux (MOV/MKV).

As there is still the alternative of using the fields in the
AVCodecParameters/AVCodecContext, my intention was to keep the first patch
small to resolve discussion on that point.

I've included the patches, if you'd like to try test it, Kieren. I see on
your test file that there may be some slight rounding error making output
crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).

/ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
    Side data:
      Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
Kieran O Leary May 5, 2020, 11:50 a.m. UTC | #7
Hi Neil,

I have to look deeper into the MKV side of things and most likely raise it
with the cellar mailing list so that better minds than mine can weigh in. I
do see that the rounding up to 704 could be an issue alright.
As for MOV, your patch appears to generate the same output clap values as
the input, so that's really great! command line and mediainfo trace below:
$ ./ffmpeg -i /mnt/c/Users/blaaa/Downloads/clap.mov -c:v v210 newv210.mov
&& mediainfo --Details=1 newv210.mov |grep -i clap -n10
ffmpeg version N-97506-g2fae000994 Copyright (c) 2000-2020 the FFmpeg
developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration:
  libavutil      56. 43.100 / 56. 43.100
  libavcodec     58. 82.100 / 58. 82.100
  libavformat    58. 42.101 / 58. 42.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 79.100 /  7. 79.100
  libswscale      5.  6.101 /  5.  6.101
  libswresample   3.  6.100 /  3.  6.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/mnt/c/Users/blaaa/Downloads/clap.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    creation_time   : 2018-09-13T08:48:41.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 80686 kb/s
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276),
yuv422p10le(smpte170m/bt470bg/bt709, top coded first (swapped)), 720x576,
79626 kb/s, SAR 59:54 DAR 295:216, 9 fps, 25 tbr, 25k tbn, 25k tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      encoder         : Uncompressed 10-Bit YUV
      timecode        : 00:00:00:00
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz,
stereo, s32 (24 bit), 2304 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : Time Code Media Handler
      reel_name       : 001
      timecode        : 00:00:00:00
File 'newv210.mov' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (v210 (native) -> v210 (native))
  Stream #0:1 -> #0:1 (pcm_s24le (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mov, to 'newv210.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    encoder         : Lavf58.42.101
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276), yuv422p10le(top
coded first (swapped)), 720x576 [SAR 59:54 DAR 295:216], q=2-31, 221184
kb/s, 0.04 fps, 12800 tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 v210
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp (24 bit), 128 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 aac
frame=   25 fps=0.0 q=-0.0 Lsize=   27009kB time=00:00:00.96
bitrate=230455.2kbits/s dup=16 drop=0 speed=19.8x
video:27000kB audio:6kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 0.008794%
[aac @ 0x7fffe91cc680] Qavg: 171.537
226-1A5FBA4         detail:                         9 (0x09)
227-1A5FBA5        Pixel Aspect Ratio (16 bytes)
228-1A5FBA5         Header (8 bytes)
229-1A5FBA5          Size:                          16 (0x00000010)
230-1A5FBA9          Name:                          pasp
231-1A5FBAD         hSpacing:                       59 (0x0000003B)
232-1A5FBB1         vSpacing:                       54 (0x00000036)
233-1A5FBB5        Clean Aperture (40 bytes)
234-1A5FBB5         Header (8 bytes)
235-1A5FBB5          Size:                          40 (0x00000028)
236:1A5FBB9          Name:                          clap
237-1A5FBBD         apertureWidth_N:                41472 (0x0000A200)
238-1A5FBC1         apertureWidth_D:                59 (0x0000003B)
239-1A5FBC5         apertureHeight_N:               576 (0x00000240)
240-1A5FBC9         apertureHeight_D:               1 (0x00000001)
241-1A5FBCD         horizOff_N:                     0 (0x00000000)
242-1A5FBD1         horizOff_D:                     1 (0x00000001)
243-1A5FBD5         vertOff_N:                      0 (0x00000000)
244-1A5FBD9         vertOff_D:                      1 (0x00000001)
245-1A5FBDD      Time to Sample (24 bytes)
246-1A5FBDD       Header (8 bytes)


and here's similar info for mkv:

 ./ffmpeg -i /mnt/c/Users/blaaa/Downloads/clap.mov -c:v v210 newv210.mkv &&
m
ediainfo --Details=1 newv210.mkv |grep -i pixel -n10
ffmpeg version N-97506-g2fae000994 Copyright (c) 2000-2020 the FFmpeg
developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration:
  libavutil      56. 43.100 / 56. 43.100
  libavcodec     58. 82.100 / 58. 82.100
  libavformat    58. 42.101 / 58. 42.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 79.100 /  7. 79.100
  libswscale      5.  6.101 /  5.  6.101
  libswresample   3.  6.100 /  3.  6.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/mnt/c/Users/blaaa/Downloads/clap.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    creation_time   : 2018-09-13T08:48:41.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 80686 kb/s
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276),
yuv422p10le(smpte170m/bt470bg/bt709, top coded first (swapped)), 720x576,
79626 kb/s, SAR 59:54 DAR 295:216, 9 fps, 25 tbr, 25k tbn, 25k tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      encoder         : Uncompressed 10-Bit YUV
      timecode        : 00:00:00:00
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz,
stereo, s32 (24 bit), 2304 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : Time Code Media Handler
      reel_name       : 001
      timecode        : 00:00:00:00
File 'newv210.mkv' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (v210 (native) -> v210 (native))
  Stream #0:1 -> #0:1 (pcm_s24le (native) -> ac3 (native))
Press [q] to stop, [?] for help
Output #0, matroska, to 'newv210.mkv':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    encoder         : Lavf58.42.101
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276), yuv422p10le(top
coded first (swapped)), 720x576 [SAR 59:54 DAR 295:216], q=2-31, 221184
kb/s, 25 fps, 1k tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 v210
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: ac3 ([0] [0][0] / 0x2000), 48000 Hz, stereo,
fltp (24 bit), 192 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 ac3
frame=    9 fps=0.0 q=-0.0 Lsize=    9731kB time=00:00:00.41
bitrate=193962.2kbits/s speed=10.6x
video:9720kB audio:10kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 0.015487%
193-000144     Data:                               40000000 (0x02625A00)
194-000148    CodecID - V_MS/VFW/FOURCC (17 bytes)
195-000148     Header (2 bytes)
196-000148      Name:                              6 (0x06)
197-000149      Size:                              15 (0x0F)
198-00014A     Data:                               V_MS/VFW/FOURCC
199-000159    Video (52 bytes)
200-000159     Header (9 bytes)
201-000159      Name:                              96 (0x60)
202-00015A      Size:                              43 (0x0000000000002B)
203:000162     PixelWidth - 720 (0x2D0) (4 bytes)
204-000162      Header (2 bytes)
205-000162       Name:                             48 (0x30)
206-000163       Size:                             2 (0x02)
207-000164      Data:                              720 (0x02D0)
208:000166     PixelHeight - 576 (0x240) (4 bytes)
209-000166      Header (2 bytes)
210-000166       Name:                             58 (0x3A)
211-000167       Size:                             2 (0x02)
212-000168      Data:                              576 (0x0240)
213-00016A     FlagInterlaced - 1 (0x1) (3 bytes)
214-00016A      Header (2 bytes)
215-00016A       Name:                             26 (0x1A)
216-00016B       Size:                             1 (0x01)
217-00016C      Data:                              1 (0x01)
218-00016D     FieldOrder - 9 (0x9) (3 bytes)
--
228-000175     DisplayHeight - 216 (0xD8) (4 bytes)
229-000175      Header (3 bytes)
230-000175       Name:                             5306 (0x14BA)
231-000177       Size:                             1 (0x01)
232-000178      Data:                              216 (0xD8)
233-000179     DisplayUnit - 3 (0x3) (4 bytes)
234-000179      Header (3 bytes)
235-000179       Name:                             5298 (0x14B2)
236-00017B       Size:                             1 (0x01)
237-00017C      Data:                              3 (0x03)
238:00017D     PixelCropLeft - 8 (0x8) (4 bytes)
239-00017D      Header (3 bytes)
240-00017D       Name:                             5324 (0x14CC)
241-00017F       Size:                             1 (0x01)
242-000180      Data:                              8 (0x08)
243:000181     PixelCropRight - 8 (0x8) (4 bytes)
244-000181      Header (3 bytes)
245-000181       Name:                             5341 (0x14DD)
246-000183       Size:                             1 (0x01)
247-000184      Data:                              8 (0x08)
248:000185     PixelCropBottom - 0 (0x0) (4 bytes)
249-000185      Header (3 bytes)
250-000185       Name:                             5290 (0x14AA)
251-000187       Size:                             1 (0x01)
252-000188      Data:                              0 (0x00)
253:000189     PixelCropTop - 0 (0x0) (4 bytes)
254-000189      Header (3 bytes)
255-000189       Name:                             5307 (0x14BB)
256-00018B       Size:                             1 (0x01)
257-00018C      Data:                              0 (0x00)
258-00018D    CodecPrivate - Copy of vids (43 bytes)
259-00018D     Header (3 bytes)
260-00018D      Name:                              9122 (0x23A2)
261-00018F      Size:                              40 (0x28)
262-000190     Size:                               40 (0x00000028)
263-000194     Width:                              720 (0x000002D0)

Best,

Kieran O'Leary
Irish Film Institute
Kieran O Leary May 5, 2020, 12:10 p.m. UTC | #8
Hi,

I broke the threading with my last reply, i apologise. Here goes another
attempt:

On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <neil.birkbeck@gmail.com>
wrote:

> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
>
> > Andreas Rheinhardt (12020-04-28):
> > > That's expected. The patch provided only provides the structure in
> which
> > > the values are intended to be exported; it does not add any demuxing or
> > > muxing capabilities for mov or mkv (as you can see from the fact that
> > > none of these (de)muxers have been changed in the patch).
> >
> > Which is something I intended to comment on: adding code without users
> > is rarely a good idea. I suggest we do not commit until at least one
> > demuxer use it, preferably at least two. Otherwise, we may realize that
> > “oh crap, it doesn't work” because of a tiny unforeseen detail.
>
>
> Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
> mux (MOV/MKV).
>
> As there is still the alternative of using the fields in the
> AVCodecParameters/AVCodecContext, my intention was to keep the first patch
> small to resolve discussion on that point.
>
> I've included the patches, if you'd like to try test it, Kieren. I see on
> your test file that there may be some slight rounding error making output
> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
>
> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
> v_offset:0/1]
> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
>

I have to look deeper into the MKV side of things and most likely raise it
with the cellar mailing list so that better minds than mine can weigh in. I
do see that the rounding up to 704 could be an issue alright.
As for MOV, your patch appears to generate the same output clap values as
the input, so that's really great! command line and mediainfo trace below:
$ ./ffmpeg -i /mnt/c/Users/blaaa/Downloads/clap.mov -c:v v210 newv210.mov
&& mediainfo --Details=1 newv210.mov |grep -i clap -n10
ffmpeg version N-97506-g2fae000994 Copyright (c) 2000-2020 the FFmpeg
developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration:
  libavutil      56. 43.100 / 56. 43.100
  libavcodec     58. 82.100 / 58. 82.100
  libavformat    58. 42.101 / 58. 42.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 79.100 /  7. 79.100
  libswscale      5.  6.101 /  5.  6.101
  libswresample   3.  6.100 /  3.  6.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/mnt/c/Users/blaaa/Downloads/clap.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    creation_time   : 2018-09-13T08:48:41.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 80686 kb/s
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276),
yuv422p10le(smpte170m/bt470bg/bt709, top coded first (swapped)), 720x576,
79626 kb/s, SAR 59:54 DAR 295:216, 9 fps, 25 tbr, 25k tbn, 25k tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      encoder         : Uncompressed 10-Bit YUV
      timecode        : 00:00:00:00
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz,
stereo, s32 (24 bit), 2304 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : Time Code Media Handler
      reel_name       : 001
      timecode        : 00:00:00:00
File 'newv210.mov' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (v210 (native) -> v210 (native))
  Stream #0:1 -> #0:1 (pcm_s24le (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mov, to 'newv210.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    encoder         : Lavf58.42.101
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276), yuv422p10le(top
coded first (swapped)), 720x576 [SAR 59:54 DAR 295:216], q=2-31, 221184
kb/s, 0.04 fps, 12800 tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 v210
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp (24 bit), 128 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 aac
frame=   25 fps=0.0 q=-0.0 Lsize=   27009kB time=00:00:00.96
bitrate=230455.2kbits/s dup=16 drop=0 speed=19.8x
video:27000kB audio:6kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 0.008794%
[aac @ 0x7fffe91cc680] Qavg: 171.537
226-1A5FBA4         detail:                         9 (0x09)
227-1A5FBA5        Pixel Aspect Ratio (16 bytes)
228-1A5FBA5         Header (8 bytes)
229-1A5FBA5          Size:                          16 (0x00000010)
230-1A5FBA9          Name:                          pasp
231-1A5FBAD         hSpacing:                       59 (0x0000003B)
232-1A5FBB1         vSpacing:                       54 (0x00000036)
233-1A5FBB5        Clean Aperture (40 bytes)
234-1A5FBB5         Header (8 bytes)
235-1A5FBB5          Size:                          40 (0x00000028)
236:1A5FBB9          Name:                          clap
237-1A5FBBD         apertureWidth_N:                41472 (0x0000A200)
238-1A5FBC1         apertureWidth_D:                59 (0x0000003B)
239-1A5FBC5         apertureHeight_N:               576 (0x00000240)
240-1A5FBC9         apertureHeight_D:               1 (0x00000001)
241-1A5FBCD         horizOff_N:                     0 (0x00000000)
242-1A5FBD1         horizOff_D:                     1 (0x00000001)
243-1A5FBD5         vertOff_N:                      0 (0x00000000)
244-1A5FBD9         vertOff_D:                      1 (0x00000001)
245-1A5FBDD      Time to Sample (24 bytes)
246-1A5FBDD       Header (8 bytes)


and here's similar info for mkv:

 ./ffmpeg -i /mnt/c/Users/blaaa/Downloads/clap.mov -c:v v210 newv210.mkv &&
m
ediainfo --Details=1 newv210.mkv |grep -i pixel -n10
ffmpeg version N-97506-g2fae000994 Copyright (c) 2000-2020 the FFmpeg
developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration:
  libavutil      56. 43.100 / 56. 43.100
  libavcodec     58. 82.100 / 58. 82.100
  libavformat    58. 42.101 / 58. 42.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 79.100 /  7. 79.100
  libswscale      5.  6.101 /  5.  6.101
  libswresample   3.  6.100 /  3.  6.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/mnt/c/Users/blaaa/Downloads/clap.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    creation_time   : 2018-09-13T08:48:41.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 80686 kb/s
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276),
yuv422p10le(smpte170m/bt470bg/bt709, top coded first (swapped)), 720x576,
79626 kb/s, SAR 59:54 DAR 295:216, 9 fps, 25 tbr, 25k tbn, 25k tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      encoder         : Uncompressed 10-Bit YUV
      timecode        : 00:00:00:00
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz,
stereo, s32 (24 bit), 2304 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : Time Code Media Handler
      reel_name       : 001
      timecode        : 00:00:00:00
File 'newv210.mkv' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (v210 (native) -> v210 (native))
  Stream #0:1 -> #0:1 (pcm_s24le (native) -> ac3 (native))
Press [q] to stop, [?] for help
Output #0, matroska, to 'newv210.mkv':
  Metadata:
    major_brand     : qt
    minor_version   : 537199360
    compatible_brands: qt
    encoder         : Lavf58.42.101
    Stream #0:0(eng): Video: v210 (v210 / 0x30313276), yuv422p10le(top
coded first (swapped)), 720x576 [SAR 59:54 DAR 295:216], q=2-31, 221184
kb/s, 25 fps, 1k tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Video Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 v210
    Side data:
      Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
    Stream #0:1(eng): Audio: ac3 ([0] [0][0] / 0x2000), 48000 Hz, stereo,
fltp (24 bit), 192 kb/s (default)
    Metadata:
      creation_time   : 2018-09-13T08:48:41.000000Z
      handler_name    : ?Apple Sound Media Handler
      timecode        : 00:00:00:00
      encoder         : Lavc58.82.100 ac3
frame=    9 fps=0.0 q=-0.0 Lsize=    9731kB time=00:00:00.41
bitrate=193962.2kbits/s speed=10.6x
video:9720kB audio:10kB subtitle:0kB other streams:0kB global headers:0kB
muxing overhead: 0.015487%
193-000144     Data:                               40000000 (0x02625A00)
194-000148    CodecID - V_MS/VFW/FOURCC (17 bytes)
195-000148     Header (2 bytes)
196-000148      Name:                              6 (0x06)
197-000149      Size:                              15 (0x0F)
198-00014A     Data:                               V_MS/VFW/FOURCC
199-000159    Video (52 bytes)
200-000159     Header (9 bytes)
201-000159      Name:                              96 (0x60)
202-00015A      Size:                              43 (0x0000000000002B)
203:000162     PixelWidth - 720 (0x2D0) (4 bytes)
204-000162      Header (2 bytes)
205-000162       Name:                             48 (0x30)
206-000163       Size:                             2 (0x02)
207-000164      Data:                              720 (0x02D0)
208:000166     PixelHeight - 576 (0x240) (4 bytes)
209-000166      Header (2 bytes)
210-000166       Name:                             58 (0x3A)
211-000167       Size:                             2 (0x02)
212-000168      Data:                              576 (0x0240)
213-00016A     FlagInterlaced - 1 (0x1) (3 bytes)
214-00016A      Header (2 bytes)
215-00016A       Name:                             26 (0x1A)
216-00016B       Size:                             1 (0x01)
217-00016C      Data:                              1 (0x01)
218-00016D     FieldOrder - 9 (0x9) (3 bytes)
--
228-000175     DisplayHeight - 216 (0xD8) (4 bytes)
229-000175      Header (3 bytes)
230-000175       Name:                             5306 (0x14BA)
231-000177       Size:                             1 (0x01)
232-000178      Data:                              216 (0xD8)
233-000179     DisplayUnit - 3 (0x3) (4 bytes)
234-000179      Header (3 bytes)
235-000179       Name:                             5298 (0x14B2)
236-00017B       Size:                             1 (0x01)
237-00017C      Data:                              3 (0x03)
238:00017D     PixelCropLeft - 8 (0x8) (4 bytes)
239-00017D      Header (3 bytes)
240-00017D       Name:                             5324 (0x14CC)
241-00017F       Size:                             1 (0x01)
242-000180      Data:                              8 (0x08)
243:000181     PixelCropRight - 8 (0x8) (4 bytes)
244-000181      Header (3 bytes)
245-000181       Name:                             5341 (0x14DD)
246-000183       Size:                             1 (0x01)
247-000184      Data:                              8 (0x08)
248:000185     PixelCropBottom - 0 (0x0) (4 bytes)
249-000185      Header (3 bytes)
250-000185       Name:                             5290 (0x14AA)
251-000187       Size:                             1 (0x01)
252-000188      Data:                              0 (0x00)
253:000189     PixelCropTop - 0 (0x0) (4 bytes)
254-000189      Header (3 bytes)
255-000189       Name:                             5307 (0x14BB)
256-00018B       Size:                             1 (0x01)
257-00018C      Data:                              0 (0x00)
258-00018D    CodecPrivate - Copy of vids (43 bytes)
259-00018D     Header (3 bytes)
260-00018D      Name:                              9122 (0x23A2)
261-00018F      Size:                              40 (0x28)
262-000190     Size:                               40 (0x00000028)
263-000194     Width:                              720 (0x000002D0)

Best,

Kieran O'Leary
Irish Film Institute
Neil Birkbeck May 6, 2020, 3:22 p.m. UTC | #9
On Tue, May 5, 2020 at 5:11 AM Kieran O Leary <kieran.o.leary@gmail.com>
wrote:

> Hi,
>
> I broke the threading with my last reply, i apologise. Here goes another
> attempt:
>
> On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <neil.birkbeck@gmail.com>
> wrote:
>
> > On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
> >
> > > Andreas Rheinhardt (12020-04-28):
> > > > That's expected. The patch provided only provides the structure in
> > which
> > > > the values are intended to be exported; it does not add any demuxing
> or
> > > > muxing capabilities for mov or mkv (as you can see from the fact that
> > > > none of these (de)muxers have been changed in the patch).
> > >
> > > Which is something I intended to comment on: adding code without users
> > > is rarely a good idea. I suggest we do not commit until at least one
> > > demuxer use it, preferably at least two. Otherwise, we may realize that
> > > “oh crap, it doesn't work” because of a tiny unforeseen detail.
> >
> >
> > Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
> > mux (MOV/MKV).
> >
> > As there is still the alternative of using the fields in the
> > AVCodecParameters/AVCodecContext, my intention was to keep the first
> patch
> > small to resolve discussion on that point.
> >
> > I've included the patches, if you'd like to try test it, Kieren. I see on
> > your test file that there may be some slight rounding error making output
> > crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
> >
> > /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
> >     Side data:
> >       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
> > v_offset:0/1]
> > ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
> > ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
> >     Side data:
> >       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
> >
>
> I have to look deeper into the MKV side of things and most likely raise it
> with the cellar mailing list so that better minds than mine can weigh in. I
> do see that the rounding up to 704 could be an issue alright.
> As for MOV, your patch appears to generate the same output clap values as
> the input, so that's really great! command line and mediainfo trace below:
>

Thanks for testing, Kieran and for linking the discussion on the cellar
list.

Any additional thoughts from ffmpeg devs on container-level SideData vs
adding the extra fields into AVCodecParameters/AVCodecContext and plumbing
into the frame instead? I anticipate some situations where there can be
interaction between cropping in bitstream and container-level cropping.
Maybe the best way forward is for me to share some sample patches for the
alternative to validate whether it supports the various use cases
(transmux, decode+crop, any other application level handling), and to
confirm the interface changes for the structs.
James Almer May 6, 2020, 3:45 p.m. UTC | #10
On 5/6/2020 12:22 PM, Neil Birkbeck wrote:
> On Tue, May 5, 2020 at 5:11 AM Kieran O Leary <kieran.o.leary@gmail.com>
> wrote:
> 
>> Hi,
>>
>> I broke the threading with my last reply, i apologise. Here goes another
>> attempt:
>>
>> On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <neil.birkbeck@gmail.com>
>> wrote:
>>
>>> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
>>>
>>>> Andreas Rheinhardt (12020-04-28):
>>>>> That's expected. The patch provided only provides the structure in
>>> which
>>>>> the values are intended to be exported; it does not add any demuxing
>> or
>>>>> muxing capabilities for mov or mkv (as you can see from the fact that
>>>>> none of these (de)muxers have been changed in the patch).
>>>>
>>>> Which is something I intended to comment on: adding code without users
>>>> is rarely a good idea. I suggest we do not commit until at least one
>>>> demuxer use it, preferably at least two. Otherwise, we may realize that
>>>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
>>>
>>>
>>> Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
>>> mux (MOV/MKV).
>>>
>>> As there is still the alternative of using the fields in the
>>> AVCodecParameters/AVCodecContext, my intention was to keep the first
>> patch
>>> small to resolve discussion on that point.
>>>
>>> I've included the patches, if you'd like to try test it, Kieren. I see on
>>> your test file that there may be some slight rounding error making output
>>> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
>>>
>>> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
>>>     Side data:
>>>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
>>> v_offset:0/1]
>>> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
>>> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
>>>     Side data:
>>>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
>>>
>>
>> I have to look deeper into the MKV side of things and most likely raise it
>> with the cellar mailing list so that better minds than mine can weigh in. I
>> do see that the rounding up to 704 could be an issue alright.
>> As for MOV, your patch appears to generate the same output clap values as
>> the input, so that's really great! command line and mediainfo trace below:
>>
> 
> Thanks for testing, Kieran and for linking the discussion on the cellar
> list.
> 
> Any additional thoughts from ffmpeg devs on container-level SideData vs
> adding the extra fields into AVCodecParameters/AVCodecContext and plumbing
> into the frame instead? I anticipate some situations where there can be
> interaction between cropping in bitstream and container-level cropping.
> Maybe the best way forward is for me to share some sample patches for the
> alternative to validate whether it supports the various use cases
> (transmux, decode+crop, any other application level handling), and to
> confirm the interface changes for the structs.

One option could be to also introduce a frame side data for this new
struct and have it replace the AVFrame fields, which would then be
deprecated (But of course keep working until removed).
This would allow to either inject stream side data from clap atoms and
Matroska crop fields into packets (Which would then be propagated to
frames to be applied during decoding), or use and export the bitstream
cropping information as it's the case right now, all while preventing
the addition of new fields to AVCodecParameters.

I would like a developer that makes use of this feature to also comment,
especially seeing how the AVFrame fields and this clap side data are
pretty different.
Andreas Rheinhardt May 7, 2020, 5:20 p.m. UTC | #11
Neil Birkbeck:
> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
> 
>> Andreas Rheinhardt (12020-04-28):
>>> That's expected. The patch provided only provides the structure in which
>>> the values are intended to be exported; it does not add any demuxing or
>>> muxing capabilities for mov or mkv (as you can see from the fact that
>>> none of these (de)muxers have been changed in the patch).
>>
>> Which is something I intended to comment on: adding code without users
>> is rarely a good idea. I suggest we do not commit until at least one
>> demuxer use it, preferably at least two. Otherwise, we may realize that
>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
> 
> 
> Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
> mux (MOV/MKV).
> 
> As there is still the alternative of using the fields in the
> AVCodecParameters/AVCodecContext, my intention was to keep the first patch
> small to resolve discussion on that point.
> 
> I've included the patches, if you'd like to try test it, Kieren. I see on
> your test file that there may be some slight rounding error making output
> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
> 
> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
> 
> 

> 
> Write out stream-level AVCleanAperture side data in the muxer.

You should separate the mov and Matroska patches.

> 
> Signed-off-by: Neil Birkbeck <neil.birkbeck@gmail.com>
> ---
>  libavformat/matroskaenc.c | 37 +++++++++++++++++++++++++++++++++++++
>  libavformat/movenc.c      | 28 ++++++++++++++++++++--------
>  2 files changed, 57 insertions(+), 8 deletions(-)
> 
> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> index 784973a951..19ff29853e 100644
> --- a/libavformat/matroskaenc.c
> +++ b/libavformat/matroskaenc.c
> @@ -901,6 +901,38 @@ static int mkv_write_video_color(AVIOContext *pb, const AVCodecParameters *par,
>      return 0;
>  }
>  
> +static int mkv_write_video_crop(AVIOContext *pb, AVCodecParameters *par, AVStream *st)
> +{
> +    int cropb = 0, cropt = 0, cropl = 0, cropr = 0;
> +    double width, height, h_offset, v_offset;
> +    int side_data_size = 0;
> +    const AVCleanAperture *clap =
> +        (const AVCleanAperture *)av_stream_get_side_data(
> +            st, AV_PKT_DATA_CLEAN_APERTURE, &side_data_size);
> +    if (!clap)
> +        return 0;
> +
> +    width = av_q2d(clap->width);
> +    height = av_q2d(clap->height);
> +    h_offset = av_q2d(clap->horizontal_offset);
> +    v_offset = av_q2d(clap->vertical_offset);
> +    cropb = (int)(par->height - (par->height / 2. + v_offset + height / 2));
> +    cropt = (int)(par->height / 2. + v_offset - height / 2);
> +    cropr = (int)(par->width - (par->width / 2. + h_offset + width / 2));
> +    cropl = (int)(par->width / 2. + h_offset - width / 2);
> +    cropb = FFMAX(cropb, 0);
> +    cropt = FFMAX(cropt, 0);
> +    cropr = FFMAX(cropr, 0);
> +    cropl = FFMAX(cropl, 0);
> +    if (!cropr && !cropl && !cropt && !cropb)
> +        return 0;
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPL, cropl);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPR, cropr);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPB, cropb);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPT, cropt);

There is no reason to write all four elements when only some of them are
different from zero.

> +    return 0;
> +}
> +
>  static int mkv_write_video_projection(AVFormatContext *s, AVIOContext *pb,
>                                        const AVStream *st)
>  {
> @@ -1287,6 +1319,11 @@ static int mkv_write_track(AVFormatContext *s, MatroskaMuxContext *mkv,
>          } else if (mkv->mode != MODE_WEBM)
>              put_ebml_uint(pb, MATROSKA_ID_VIDEODISPLAYUNIT, MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN);
>  
> +        // Write out pixel crop
> +        ret = mkv_write_video_crop(pb, par, st);
> +        if (ret < 0)
> +            return ret;

This is the wrong place for this: It needs to be performed before we
write the display dimensions and you need to return the cropped width
and height via pointer arguments, so that the code doing display
dimensions (which needs to be updated) knows about the cropping. This is
necessary, because cropping is supposed to happen before scaling: If the
code for writing DisplayWidth/Height didn't know about the real
dimensions to scale, it would write the DisplayWidth/Height values that
are appropriate for a video with the given pixel aspect ratio and the
uncropped pixel dimensions.

We should btw error out if someone sends stereo 3d data that implies
that a dimension is doubled if cropping is intended to be applied in the
same dimension, as this scenario is not really well defined.

And let me also add other points about the sorry state of the PixelCrop*
elements:
a) The display aspect ratio usually changes when using cropping; the
pixel aspect ratio doesn't change, yet unfortunately (and inexplicably)
Matroska does not support that. This means that if one sets the display
aspect ratio explicitly when using cropping, a player that ignores the
cropping will not only display the video with the area that ought to be
cropped away; it will also display it with a wrong pixel aspect ratio.

Given that almost all players don't support the cropping flags simply
saying that these players are broken and need to be fixed is not an option.

But there is a method to create videos that will be played with the
correct pixel aspect ratio by both types of players. It makes use of the
fact that (when using pixels as DisplayUnit (it's the default value
anyway)) the default value of DisplayWidth and DisplayHeight are given
by the dimensions of the cropped frame.
It works especially well with nonanamorphic videos. Consider the case of
a 1280x720 nonanamorphic video with 40 pixels on top and bottom cropping
on top and bottom (each) and 240 pixels on the left and right (each). If
one leaves the DisplayWidth/Height away, a player that ignores the crop
values will infer DisplayWidth to be 1280 and DisplayHeight to be 720,
whereas a player that supports the crop values will infer DisplayHeight
to be 640 and DisplayWidth to be 800. Both players will use quadratic
pixels.
It does not work so well with anamorphic videos. If one only wants to
crop in one dimension (say, the vertical one), then one can still
produce videos that work well with both players: Leave out the
DisplayHeight and set DisplayWidth to the common value suitable for
both. (This already has a downside: An absolutely precise display aspect
ratio is no longer attainable with this method in general.)
If one has anamorphic video and wants to crop in both dimensions, then
there is no way of making both types of players display the video with
the same pixel aspect ratio (unless the cropping happens to be so that
the ratio of the uncropped frame and the ratio of the cropped frame
coincide).

We already don't write the display dimensions when nonanamorphic and
your (revised) patch needs to preserve that (should be trivial). Given
that the method outlined above in case cropping is restricted to one
dimension might have side-effects I don't think it should be implemented
(would also be too niche); but one (needn't be you) should add a warning
for users in case players not supporting the crop elements might play
the output with the wrong pixel aspect ratio. (Said warning should
preferably be added in a separate commit.)

b) MKVToolNix behaviour wrt cropping is unfortunately buggy: mkvmerge
reads and propagates the cropping parameters upon remuxing, but it does
not take them into account when inferring the display dimensions (which
it explicitly sets); therefore it changes the pixel aspect ratio upon
remuxing (with the usual exception of the ratios of cropped and
uncropped frames agreeing).
And the GUIs help text does not mention that one should also modify the
display dimensions when editing the cropping parameters. E.g. both
videos from ticket 4489 (and also the original in VLC ticket 13982) are
anamorphic when the specs are followed. This was certainly not intended.
Certainly a large part of the files in existence that have crop values
set have them set in the wrong way. Adding a workaround would
unfortunately incur the possibility of misparsing valid files. My
current approach is to wait and see how many users complain before
deciding on whether to add a workaround.

> +
>          if (par->codec_id == AV_CODEC_ID_RAWVIDEO) {
>              uint32_t color_space = av_le2ne32(par->codec_tag);
>              put_ebml_binary(pb, MATROSKA_ID_VIDEOCOLORSPACE, &color_space, sizeof(color_space));

> 
> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> index 8e1326abf6..de520be247 100644
> --- a/libavformat/matroskadec.c
> +++ b/libavformat/matroskadec.c
> @@ -190,6 +190,10 @@ typedef struct MatroskaTrackVideo {
>      uint64_t display_height;
>      uint64_t pixel_width;
>      uint64_t pixel_height;
> +    uint64_t pixel_cropb;
> +    uint64_t pixel_cropt;
> +    uint64_t pixel_cropl;
> +    uint64_t pixel_cropr;
>      EbmlBin  color_space;
>      uint64_t display_unit;
>      uint64_t interlaced;
> @@ -480,10 +484,10 @@ static EbmlSyntax matroska_track_video[] = {
>      { MATROSKA_ID_VIDEOALPHAMODE,      EBML_UINT,  0, offsetof(MatroskaTrackVideo, alpha_mode) },
>      { MATROSKA_ID_VIDEOCOLOR,          EBML_NEST,  sizeof(MatroskaTrackVideoColor), offsetof(MatroskaTrackVideo, color), { .n = matroska_track_video_color } },
>      { MATROSKA_ID_VIDEOPROJECTION,     EBML_NEST,  0, offsetof(MatroskaTrackVideo, projection), { .n = matroska_track_video_projection } },
> -    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_NONE },
> +    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropb) },
> +    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropt) },
> +    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropl) },
> +    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropr) },
>      { MATROSKA_ID_VIDEODISPLAYUNIT,    EBML_UINT,  0, offsetof(MatroskaTrackVideo, display_unit), { .u= MATROSKA_VIDEO_DISPLAYUNIT_PIXELS } },
>      { MATROSKA_ID_VIDEOFLAGINTERLACED, EBML_UINT,  0, offsetof(MatroskaTrackVideo, interlaced),  { .u = MATROSKA_VIDEO_INTERLACE_FLAG_UNDETERMINED } },
>      { MATROSKA_ID_VIDEOFIELDORDER,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, field_order), { .u = MATROSKA_VIDEO_FIELDORDER_UNDETERMINED } },
> @@ -2695,7 +2699,8 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              MatroskaTrackPlane *planes = track->operation.combine_planes.elem;
>              int display_width_mul  = 1;
>              int display_height_mul = 1;
> -
> +            const int cropped_width = track->video.pixel_width - track->video.pixel_cropl - track->video.pixel_cropr;
> +            const int cropped_height = track->video.pixel_height - track->video.pixel_cropt - track->video.pixel_cropb;

Missing check for bogus values in which the total crop is more than is
available in that dimension. (Yes, I know, there are lots of missing
checks already in this function. But let's not add another one.)

>              st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
>              st->codecpar->codec_tag  = fourcc;
>              if (bit_depth >= 0)
> @@ -2703,6 +2708,29 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              st->codecpar->width      = track->video.pixel_width;
>              st->codecpar->height     = track->video.pixel_height;
>  
> +            if (track->video.pixel_cropt || track->video.pixel_cropb ||
> +                track->video.pixel_cropl || track->video.pixel_cropr) {
> +                AVCleanAperture *metadata = av_clean_aperture_alloc();
> +                if (!metadata)
> +                    return AVERROR(ENOMEM);
> +                metadata->width.num = cropped_width;
> +                metadata->height.num = cropped_height;
> +                metadata->width.den = 1;
> +                metadata->height.den = 1;

Can be aligned on '='.

> +                av_reduce(&metadata->horizontal_offset.num,
> +                          &metadata->horizontal_offset.den,
> +                          (int)track->video.pixel_cropl - (int)track->video.pixel_cropr, 2, INT_MAX);
> +                av_reduce(&metadata->vertical_offset.num,
> +                          &metadata->vertical_offset.den,
> +                          (int)track->video.pixel_cropt - (int)track->video.pixel_cropb, 2, INT_MAX);
> +                ret = av_stream_add_side_data(st, AV_PKT_DATA_CLEAN_APERTURE,
> +                                              (uint8_t *)metadata, sizeof(*metadata));
> +                if (ret < 0) {
> +                    av_freep(&metadata);
> +                    return ret;
> +                }
> +            }
> +
>              if (track->video.interlaced == MATROSKA_VIDEO_INTERLACE_FLAG_INTERLACED)
>                  st->codecpar->field_order = mkv_field_order(matroska, track->video.field_order);
>              else if (track->video.interlaced == MATROSKA_VIDEO_INTERLACE_FLAG_PROGRESSIVE)
> @@ -2714,8 +2742,8 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              if (track->video.display_unit < MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN) {
>                  av_reduce(&st->sample_aspect_ratio.num,
>                            &st->sample_aspect_ratio.den,
> -                          st->codecpar->height * track->video.display_width  * display_width_mul,
> -                          st->codecpar->width  * track->video.display_height * display_height_mul,
> +                          cropped_height * track->video.display_width  * display_width_mul,
> +                          cropped_width  * track->video.display_height * display_height_mul,

The display_width/height values used here might not be directly coded in
the file, but inferred. And the default value depends upon the crop
values, so this needs to be updated, too. Actually, if one has a
non-anamorphic Matroska video with DisplayWidth/Height values explicitly
coded and crop values set, then with your patches applied the demuxer
would correctly demux it and the muxer would leave DisplayWidth/Height
values out (so that it would still be non-anamorphic), but the demuxer
would then infer that the new file is anamorphic (unless the ratio of
the cropped and uncropped frames coincide, as usual).

Thank you for tackling this. I always wanted to add support for this and
would have done so if there had been a way to export the cropping
information.

- Andreas
Andreas Rheinhardt May 7, 2020, 10:27 p.m. UTC | #12
Neil Birkbeck:
> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
> 
>> Andreas Rheinhardt (12020-04-28):
>>> That's expected. The patch provided only provides the structure in which
>>> the values are intended to be exported; it does not add any demuxing or
>>> muxing capabilities for mov or mkv (as you can see from the fact that
>>> none of these (de)muxers have been changed in the patch).
>>
>> Which is something I intended to comment on: adding code without users
>> is rarely a good idea. I suggest we do not commit until at least one
>> demuxer use it, preferably at least two. Otherwise, we may realize that
>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
> 
> 
> Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
> mux (MOV/MKV).
> 
> As there is still the alternative of using the fields in the
> AVCodecParameters/AVCodecContext, my intention was to keep the first patch
> small to resolve discussion on that point.
> 
> I've included the patches, if you'd like to try test it, Kieren. I see on
> your test file that there may be some slight rounding error making output
> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
> 
> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1 v_offset:0/1]
> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
>     Side data:
>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
> 
> 

> 
> Write out stream-level AVCleanAperture side data in the muxer.

You should separate the mov and Matroska patches.

> 
> Signed-off-by: Neil Birkbeck <neil.birkbeck@gmail.com>
> ---
>  libavformat/matroskaenc.c | 37 +++++++++++++++++++++++++++++++++++++
>  libavformat/movenc.c      | 28 ++++++++++++++++++++--------
>  2 files changed, 57 insertions(+), 8 deletions(-)
> 
> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> index 784973a951..19ff29853e 100644
> --- a/libavformat/matroskaenc.c
> +++ b/libavformat/matroskaenc.c
> @@ -901,6 +901,38 @@ static int mkv_write_video_color(AVIOContext *pb, const AVCodecParameters *par,
>      return 0;
>  }
>  
> +static int mkv_write_video_crop(AVIOContext *pb, AVCodecParameters *par, AVStream *st)
> +{
> +    int cropb = 0, cropt = 0, cropl = 0, cropr = 0;
> +    double width, height, h_offset, v_offset;
> +    int side_data_size = 0;
> +    const AVCleanAperture *clap =
> +        (const AVCleanAperture *)av_stream_get_side_data(
> +            st, AV_PKT_DATA_CLEAN_APERTURE, &side_data_size);
> +    if (!clap)
> +        return 0;
> +
> +    width = av_q2d(clap->width);
> +    height = av_q2d(clap->height);
> +    h_offset = av_q2d(clap->horizontal_offset);
> +    v_offset = av_q2d(clap->vertical_offset);
> +    cropb = (int)(par->height - (par->height / 2. + v_offset + height / 2));
> +    cropt = (int)(par->height / 2. + v_offset - height / 2);
> +    cropr = (int)(par->width - (par->width / 2. + h_offset + width / 2));
> +    cropl = (int)(par->width / 2. + h_offset - width / 2);
> +    cropb = FFMAX(cropb, 0);
> +    cropt = FFMAX(cropt, 0);
> +    cropr = FFMAX(cropr, 0);
> +    cropl = FFMAX(cropl, 0);
> +    if (!cropr && !cropl && !cropt && !cropb)
> +        return 0;
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPL, cropl);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPR, cropr);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPB, cropb);
> +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPT, cropt);

There is no reason to write all four elements when only some of them are
different from zero.

> +    return 0;
> +}
> +
>  static int mkv_write_video_projection(AVFormatContext *s, AVIOContext *pb,
>                                        const AVStream *st)
>  {
> @@ -1287,6 +1319,11 @@ static int mkv_write_track(AVFormatContext *s, MatroskaMuxContext *mkv,
>          } else if (mkv->mode != MODE_WEBM)
>              put_ebml_uint(pb, MATROSKA_ID_VIDEODISPLAYUNIT, MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN);
>  
> +        // Write out pixel crop
> +        ret = mkv_write_video_crop(pb, par, st);
> +        if (ret < 0)
> +            return ret;

This is the wrong place for this: It needs to be performed before we
write the display dimensions and you need to return the cropped width
and height via pointer arguments, so that the code doing display
dimensions (which needs to be updated) knows about the cropping. This is
necessary, because cropping is supposed to happen before scaling: If the
code for writing DisplayWidth/Height didn't know about the real
dimensions to scale, it would write the DisplayWidth/Height values that
are appropriate for a video with the given pixel aspect ratio and the
uncropped pixel dimensions.

We should btw error out if someone sends stereo 3d data that implies
that a dimension is doubled if cropping is intended to be applied in the
same dimension, as this scenario is not really well defined.

And let me also add other points about the sorry state of the PixelCrop*
elements:
a) The display aspect ratio usually changes when using cropping; the
pixel aspect ratio doesn't change, yet unfortunately (and inexplicably)
Matroska does not support that. This means that if one sets the display
aspect ratio explicitly when using cropping, a player that ignores the
cropping will not only display the video with the area that ought to be
cropped away; it will also display it with a wrong pixel aspect ratio.

Given that almost all players don't support the cropping flags simply
saying that these players are broken and need to be fixed is not an option.

But there is a method to create videos that will be played with the
correct pixel aspect ratio by both types of players. It makes use of the
fact that (when using pixels as DisplayUnit (it's the default value
anyway)) the default value of DisplayWidth and DisplayHeight are given
by the dimensions of the cropped frame.
It works especially well with nonanamorphic videos. Consider the case of
a 1280x720 nonanamorphic video with 40 pixels on top and bottom cropping
on top and bottom (each) and 240 pixels on the left and right (each). If
one leaves the DisplayWidth/Height away, a player that ignores the crop
values will infer DisplayWidth to be 1280 and DisplayHeight to be 720,
whereas a player that supports the crop values will infer DisplayHeight
to be 640 and DisplayWidth to be 800. Both players will use quadratic
pixels.
It does not work so well with anamorphic videos. If one only wants to
crop in one dimension (say, the vertical one), then one can still
produce videos that work well with both players: Leave out the
DisplayHeight and set DisplayWidth to the common value suitable for
both. (This already has a downside: An absolutely precise display aspect
ratio is no longer attainable with this method in general.)
If one has anamorphic video and wants to crop in both dimensions, then
there is no way of making both types of players display the video with
the same pixel aspect ratio (unless the cropping happens to be so that
the ratio of the uncropped frame and the ratio of the cropped frame
coincide).

We already don't write the display dimensions when nonanamorphic and
your (revised) patch needs to preserve that (should be trivial). Given
that the method outlined above in case cropping is restricted to one
dimension might have side-effects I don't think it should be implemented
(would also be too niche); but one (needn't be you) should add a warning
for users in case players not supporting the crop elements might play
the output with the wrong pixel aspect ratio. (Said warning should
preferably be added in a separate commit.)

b) MKVToolNix behaviour wrt cropping is unfortunately buggy: mkvmerge
reads and propagates the cropping parameters upon remuxing, but it does
not take them into account when inferring the display dimensions (which
it explicitly sets); therefore it changes the pixel aspect ratio upon
remuxing (with the usual exception of the ratios of cropped and
uncropped frames agreeing).
And the GUIs help text does not mention that one should also modify the
display dimensions when editing the cropping parameters. E.g. both
videos from ticket 4489 (and also the original in VLC ticket 13982) are
anamorphic when the specs are followed. This was certainly not intended.
Certainly a large part of the files in existence that have crop values
set have them set in the wrong way. Adding a workaround would
unfortunately incur the possibility of misparsing valid files. My
current approach is to wait and see how many users complain before
deciding on whether to add a workaround.

> +
>          if (par->codec_id == AV_CODEC_ID_RAWVIDEO) {
>              uint32_t color_space = av_le2ne32(par->codec_tag);
>              put_ebml_binary(pb, MATROSKA_ID_VIDEOCOLORSPACE, &color_space, sizeof(color_space));

> 
> diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> index 8e1326abf6..de520be247 100644
> --- a/libavformat/matroskadec.c
> +++ b/libavformat/matroskadec.c
> @@ -190,6 +190,10 @@ typedef struct MatroskaTrackVideo {
>      uint64_t display_height;
>      uint64_t pixel_width;
>      uint64_t pixel_height;
> +    uint64_t pixel_cropb;
> +    uint64_t pixel_cropt;
> +    uint64_t pixel_cropl;
> +    uint64_t pixel_cropr;
>      EbmlBin  color_space;
>      uint64_t display_unit;
>      uint64_t interlaced;
> @@ -480,10 +484,10 @@ static EbmlSyntax matroska_track_video[] = {
>      { MATROSKA_ID_VIDEOALPHAMODE,      EBML_UINT,  0, offsetof(MatroskaTrackVideo, alpha_mode) },
>      { MATROSKA_ID_VIDEOCOLOR,          EBML_NEST,  sizeof(MatroskaTrackVideoColor), offsetof(MatroskaTrackVideo, color), { .n = matroska_track_video_color } },
>      { MATROSKA_ID_VIDEOPROJECTION,     EBML_NEST,  0, offsetof(MatroskaTrackVideo, projection), { .n = matroska_track_video_projection } },
> -    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_NONE },
> -    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_NONE },
> +    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropb) },
> +    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropt) },
> +    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropl) },
> +    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, pixel_cropr) },
>      { MATROSKA_ID_VIDEODISPLAYUNIT,    EBML_UINT,  0, offsetof(MatroskaTrackVideo, display_unit), { .u= MATROSKA_VIDEO_DISPLAYUNIT_PIXELS } },
>      { MATROSKA_ID_VIDEOFLAGINTERLACED, EBML_UINT,  0, offsetof(MatroskaTrackVideo, interlaced),  { .u = MATROSKA_VIDEO_INTERLACE_FLAG_UNDETERMINED } },
>      { MATROSKA_ID_VIDEOFIELDORDER,     EBML_UINT,  0, offsetof(MatroskaTrackVideo, field_order), { .u = MATROSKA_VIDEO_FIELDORDER_UNDETERMINED } },
> @@ -2695,7 +2699,8 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              MatroskaTrackPlane *planes = track->operation.combine_planes.elem;
>              int display_width_mul  = 1;
>              int display_height_mul = 1;
> -
> +            const int cropped_width = track->video.pixel_width - track->video.pixel_cropl - track->video.pixel_cropr;
> +            const int cropped_height = track->video.pixel_height - track->video.pixel_cropt - track->video.pixel_cropb;

Missing check for bogus values in which the total crop is more than is
available in that dimension. (Yes, I know, there are lots of missing
checks already in this function. But let's not add another one.)

>              st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
>              st->codecpar->codec_tag  = fourcc;
>              if (bit_depth >= 0)
> @@ -2703,6 +2708,29 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              st->codecpar->width      = track->video.pixel_width;
>              st->codecpar->height     = track->video.pixel_height;
>  
> +            if (track->video.pixel_cropt || track->video.pixel_cropb ||
> +                track->video.pixel_cropl || track->video.pixel_cropr) {
> +                AVCleanAperture *metadata = av_clean_aperture_alloc();
> +                if (!metadata)
> +                    return AVERROR(ENOMEM);
> +                metadata->width.num = cropped_width;
> +                metadata->height.num = cropped_height;
> +                metadata->width.den = 1;
> +                metadata->height.den = 1;

Can be aligned on '='.

> +                av_reduce(&metadata->horizontal_offset.num,
> +                          &metadata->horizontal_offset.den,
> +                          (int)track->video.pixel_cropl - (int)track->video.pixel_cropr, 2, INT_MAX);
> +                av_reduce(&metadata->vertical_offset.num,
> +                          &metadata->vertical_offset.den,
> +                          (int)track->video.pixel_cropt - (int)track->video.pixel_cropb, 2, INT_MAX);
> +                ret = av_stream_add_side_data(st, AV_PKT_DATA_CLEAN_APERTURE,
> +                                              (uint8_t *)metadata, sizeof(*metadata));
> +                if (ret < 0) {
> +                    av_freep(&metadata);
> +                    return ret;
> +                }
> +            }
> +
>              if (track->video.interlaced == MATROSKA_VIDEO_INTERLACE_FLAG_INTERLACED)
>                  st->codecpar->field_order = mkv_field_order(matroska, track->video.field_order);
>              else if (track->video.interlaced == MATROSKA_VIDEO_INTERLACE_FLAG_PROGRESSIVE)
> @@ -2714,8 +2742,8 @@ static int matroska_parse_tracks(AVFormatContext *s)
>              if (track->video.display_unit < MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN) {
>                  av_reduce(&st->sample_aspect_ratio.num,
>                            &st->sample_aspect_ratio.den,
> -                          st->codecpar->height * track->video.display_width  * display_width_mul,
> -                          st->codecpar->width  * track->video.display_height * display_height_mul,
> +                          cropped_height * track->video.display_width  * display_width_mul,
> +                          cropped_width  * track->video.display_height * display_height_mul,

The display_width/height values used here might not be directly coded in
the file, but inferred. And the default value depends upon the crop
values, so this needs to be updated, too. Actually, if one has a
non-anamorphic Matroska video with DisplayWidth/Height values explicitly
coded and crop values set, then with your patches applied the demuxer
would correctly demux it and the muxer would leave DisplayWidth/Height
values out (so that it would still be non-anamorphic), but the demuxer
would then infer that the new file is anamorphic (unless the ratio of
the cropped and uncropped frames coincide, as usual).

Thank you for tackling this. I always wanted to add support for this and
would have done so if there had been a way to export the cropping
information.

- Andreas
Neil Birkbeck May 12, 2020, 4:37 a.m. UTC | #13
On Wed, May 6, 2020 at 8:45 AM James Almer <jamrial@gmail.com> wrote:

> On 5/6/2020 12:22 PM, Neil Birkbeck wrote:
> > On Tue, May 5, 2020 at 5:11 AM Kieran O Leary <kieran.o.leary@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I broke the threading with my last reply, i apologise. Here goes another
> >> attempt:
> >>
> >> On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <neil.birkbeck@gmail.com>
> >> wrote:
> >>
> >>> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org>
> wrote:
> >>>
> >>>> Andreas Rheinhardt (12020-04-28):
> >>>>> That's expected. The patch provided only provides the structure in
> >>> which
> >>>>> the values are intended to be exported; it does not add any demuxing
> >> or
> >>>>> muxing capabilities for mov or mkv (as you can see from the fact that
> >>>>> none of these (de)muxers have been changed in the patch).
> >>>>
> >>>> Which is something I intended to comment on: adding code without users
> >>>> is rarely a good idea. I suggest we do not commit until at least one
> >>>> demuxer use it, preferably at least two. Otherwise, we may realize
> that
> >>>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
> >>>
> >>>
> >>> Thanks for the feedback. I also have patches for the demux (MOV/MKV)
> and
> >>> mux (MOV/MKV).
> >>>
> >>> As there is still the alternative of using the fields in the
> >>> AVCodecParameters/AVCodecContext, my intention was to keep the first
> >> patch
> >>> small to resolve discussion on that point.
> >>>
> >>> I've included the patches, if you'd like to try test it, Kieren. I see
> on
> >>> your test file that there may be some slight rounding error making
> output
> >>> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
> >>>
> >>> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
> >>>     Side data:
> >>>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
> >>> v_offset:0/1]
> >>> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy
> /tmp/clap.mkv
> >>> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
> >>>     Side data:
> >>>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1
> v_offset:0/1]
> >>>
> >>
> >> I have to look deeper into the MKV side of things and most likely raise
> it
> >> with the cellar mailing list so that better minds than mine can weigh
> in. I
> >> do see that the rounding up to 704 could be an issue alright.
> >> As for MOV, your patch appears to generate the same output clap values
> as
> >> the input, so that's really great! command line and mediainfo trace
> below:
> >>
> >
> > Thanks for testing, Kieran and for linking the discussion on the cellar
> > list.
> >
> > Any additional thoughts from ffmpeg devs on container-level SideData vs
> > adding the extra fields into AVCodecParameters/AVCodecContext and
> plumbing
> > into the frame instead? I anticipate some situations where there can be
> > interaction between cropping in bitstream and container-level cropping.
> > Maybe the best way forward is for me to share some sample patches for the
> > alternative to validate whether it supports the various use cases
> > (transmux, decode+crop, any other application level handling), and to
> > confirm the interface changes for the structs.
>
> One option could be to also introduce a frame side data for this new
> struct and have it replace the AVFrame fields, which would then be
> deprecated (But of course keep working until removed).
> This would allow to either inject stream side data from clap atoms and
> Matroska crop fields into packets (Which would then be propagated to
> frames to be applied during decoding), or use and export the bitstream
> cropping information as it's the case right now, all while preventing
> the addition of new fields to AVCodecParameters.
>
>
I agree that sharing the SideData with the frame could be nice for the
above reasons. I guess any interaction or conflict between bitstream &
container cropping could then get resolved at the decoder (e.g., assuming
that say SPS from h264 and container-level cropping should be composed).


> I would like a developer that makes use of this feature to also comment,
> especially seeing how the AVFrame fields and this clap side data are
> pretty different.
>

The current CleanAperture representation was in part motivated to 1) keep
the representation capable of representing the CLAP atom (same
representation and rationals), and 2) to make it unambiguous that this was
container-level stream metadata. The representation is a tiny bit more
tedious when trying to actually perform a crop (e.g., to extract the
top-left corner offset). If sharing as AVFrame side data, a representation
closer to AVFrame's crop_{top,bottom,left,right} may be more natural.

Within ffmpeg, I see that the existing codecs using AVFrame cropping
include:
-libavcodec/h264_slice.c: propagating the crop_fields from the SPS in h264
in
-libavcodec/hevc_refs.c: (sly to h264)
-libavcodec/agm.c: crop_{left, top} inferred from
avctx->coded_{width,height} - avctx->{width,height}
-libavcodec/videotoolbox.c: crop fields explicitly set to zero.
-libvacodec/vp3.c: crop_{left,top} set from coded bitstream’s offset_{x,y}
(right,bottom from coded_{width,height})

Upon review, I see there is one other format that appears to export similar
cropping information into key/value pair metadata that could potentially
also use the stream-level side data: (libavformat/cinedec.c:
set_metadata_int(&st->metadata, "crop_left", avio_rl32(pb), 1))
Neil Birkbeck May 12, 2020, 5:06 a.m. UTC | #14
On Thu, May 7, 2020 at 10:21 AM Andreas Rheinhardt <
andreas.rheinhardt@gmail.com> wrote:

> Neil Birkbeck:
> > On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org> wrote:
> >
> >> Andreas Rheinhardt (12020-04-28):
> >>> That's expected. The patch provided only provides the structure in
> which
> >>> the values are intended to be exported; it does not add any demuxing or
> >>> muxing capabilities for mov or mkv (as you can see from the fact that
> >>> none of these (de)muxers have been changed in the patch).
> >>
> >> Which is something I intended to comment on: adding code without users
> >> is rarely a good idea. I suggest we do not commit until at least one
> >> demuxer use it, preferably at least two. Otherwise, we may realize that
> >> “oh crap, it doesn't work” because of a tiny unforeseen detail.
> >
> >
> > Thanks for the feedback. I also have patches for the demux (MOV/MKV) and
> > mux (MOV/MKV).
> >
> > As there is still the alternative of using the fields in the
> > AVCodecParameters/AVCodecContext, my intention was to keep the first
> patch
> > small to resolve discussion on that point.
> >
> > I've included the patches, if you'd like to try test it, Kieren. I see on
> > your test file that there may be some slight rounding error making output
> > crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
> >
> > /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
> >     Side data:
> >       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
> v_offset:0/1]
> > ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy /tmp/clap.mkv
> > ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
> >     Side data:
> >       Clean aperture:[width 704/1 height:576/1 h_offset:0/1 v_offset:0/1]
> >
> >
>
> >
> > Write out stream-level AVCleanAperture side data in the muxer.
>
> You should separate the mov and Matroska patches.


I will do this as I incorporate the rest of your suggestions.



>
> > Signed-off-by: Neil Birkbeck <neil.birkbeck@gmail.com>
> > ---
> >  libavformat/matroskaenc.c | 37 +++++++++++++++++++++++++++++++++++++
> >  libavformat/movenc.c      | 28 ++++++++++++++++++++--------
> >  2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> > index 784973a951..19ff29853e 100644
> > --- a/libavformat/matroskaenc.c
> > +++ b/libavformat/matroskaenc.c
> > @@ -901,6 +901,38 @@ static int mkv_write_video_color(AVIOContext *pb,
> const AVCodecParameters *par,
> >      return 0;
> >  }
> >
> > +static int mkv_write_video_crop(AVIOContext *pb, AVCodecParameters
> *par, AVStream *st)
> > +{
> > +    int cropb = 0, cropt = 0, cropl = 0, cropr = 0;
> > +    double width, height, h_offset, v_offset;
> > +    int side_data_size = 0;
> > +    const AVCleanAperture *clap =
> > +        (const AVCleanAperture *)av_stream_get_side_data(
> > +            st, AV_PKT_DATA_CLEAN_APERTURE, &side_data_size);
> > +    if (!clap)
> > +        return 0;
> > +
> > +    width = av_q2d(clap->width);
> > +    height = av_q2d(clap->height);
> > +    h_offset = av_q2d(clap->horizontal_offset);
> > +    v_offset = av_q2d(clap->vertical_offset);
> > +    cropb = (int)(par->height - (par->height / 2. + v_offset + height /
> 2));
> > +    cropt = (int)(par->height / 2. + v_offset - height / 2);
> > +    cropr = (int)(par->width - (par->width / 2. + h_offset + width /
> 2));
> > +    cropl = (int)(par->width / 2. + h_offset - width / 2);
> > +    cropb = FFMAX(cropb, 0);
> > +    cropt = FFMAX(cropt, 0);
> > +    cropr = FFMAX(cropr, 0);
> > +    cropl = FFMAX(cropl, 0);
> > +    if (!cropr && !cropl && !cropt && !cropb)
> > +        return 0;
> > +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPL, cropl);
> > +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPR, cropr);
> > +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPB, cropb);
> > +    put_ebml_uint(pb, MATROSKA_ID_VIDEOPIXELCROPT, cropt);
>
> There is no reason to write all four elements when only some of them are
> different from zero.
>
>
Good point. I will avoid doing this in the revised set of patches.


> > +    return 0;
> > +}
> > +
> >  static int mkv_write_video_projection(AVFormatContext *s, AVIOContext
> *pb,
> >                                        const AVStream *st)
> >  {
> > @@ -1287,6 +1319,11 @@ static int mkv_write_track(AVFormatContext *s,
> MatroskaMuxContext *mkv,
> >          } else if (mkv->mode != MODE_WEBM)
> >              put_ebml_uint(pb, MATROSKA_ID_VIDEODISPLAYUNIT,
> MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN);
> >
> > +        // Write out pixel crop
> > +        ret = mkv_write_video_crop(pb, par, st);
> > +        if (ret < 0)
> > +            return ret;
>
> This is the wrong place for this: It needs to be performed before we
> write the display dimensions and you need to return the cropped width
> and height via pointer arguments, so that the code doing display
> dimensions (which needs to be updated) knows about the cropping. This is
> necessary, because cropping is supposed to happen before scaling: If the
> code for writing DisplayWidth/Height didn't know about the real
> dimensions to scale, it would write the DisplayWidth/Height values that
> are appropriate for a video with the given pixel aspect ratio and the
> uncropped pixel dimensions.
>

Good point. I was a bit careless in the initial patch.


>
> We should btw error out if someone sends stereo 3d data that implies
> that a dimension is doubled if cropping is intended to be applied in the
> same dimension, as this scenario is not really well defined.
>
> And let me also add other points about the sorry state of the PixelCrop*
> elements:
> a) The display aspect ratio usually changes when using cropping; the
> pixel aspect ratio doesn't change, yet unfortunately (and inexplicably)
> Matroska does not support that. This means that if one sets the display
> aspect ratio explicitly when using cropping, a player that ignores the
> cropping will not only display the video with the area that ought to be
> cropped away; it will also display it with a wrong pixel aspect ratio.
>
> Given that almost all players don't support the cropping flags simply
> saying that these players are broken and need to be fixed is not an option.
>
> But there is a method to create videos that will be played with the
> correct pixel aspect ratio by both types of players. It makes use of the
> fact that (when using pixels as DisplayUnit (it's the default value
> anyway)) the default value of DisplayWidth and DisplayHeight are given
> by the dimensions of the cropped frame.
> It works especially well with nonanamorphic videos. Consider the case of
> a 1280x720 nonanamorphic video with 40 pixels on top and bottom cropping
> on top and bottom (each) and 240 pixels on the left and right (each). If
> one leaves the DisplayWidth/Height away, a player that ignores the crop
> values will infer DisplayWidth to be 1280 and DisplayHeight to be 720,
> whereas a player that supports the crop values will infer DisplayHeight
> to be 640 and DisplayWidth to be 800. Both players will use quadratic
> pixels.
> It does not work so well with anamorphic videos. If one only wants to
> crop in one dimension (say, the vertical one), then one can still
> produce videos that work well with both players: Leave out the
> DisplayHeight and set DisplayWidth to the common value suitable for
> both. (This already has a downside: An absolutely precise display aspect
> ratio is no longer attainable with this method in general.)
> If one has anamorphic video and wants to crop in both dimensions, then
> there is no way of making both types of players display the video with
> the same pixel aspect ratio (unless the cropping happens to be so that
> the ratio of the uncropped frame and the ratio of the cropped frame
> coincide).
>

Great point and example.


>
> We already don't write the display dimensions when nonanamorphic and
> your (revised) patch needs to preserve that (should be trivial). Given
> that the method outlined above in case cropping is restricted to one
> dimension might have side-effects I don't think it should be implemented
> (would also be too niche); but one (needn't be you) should add a warning
> for users in case players not supporting the crop elements might play
> the output with the wrong pixel aspect ratio. (Said warning should
> preferably be added in a separate commit.)
>
> b) MKVToolNix behaviour wrt cropping is unfortunately buggy: mkvmerge
> reads and propagates the cropping parameters upon remuxing, but it does
> not take them into account when inferring the display dimensions (which
> it explicitly sets); therefore it changes the pixel aspect ratio upon
> remuxing (with the usual exception of the ratios of cropped and
> uncropped frames agreeing).
> And the GUIs help text does not mention that one should also modify the
> display dimensions when editing the cropping parameters. E.g. both
> videos from ticket 4489 (and also the original in VLC ticket 13982) are
> anamorphic when the specs are followed. This was certainly not intended.
> Certainly a large part of the files in existence that have crop values
> set have them set in the wrong way. Adding a workaround would
> unfortunately incur the possibility of misparsing valid files. My
> current approach is to wait and see how many users complain before
> deciding on whether to add a workaround.
>
>
Oh, I see. Thanks for the additional context. This is good to know.

This reminds me, In Ffmpeg, in cases where the crop has been applied in the
decoder, we are going to want to avoid leaking the stream-level crop
metadata into the output file (to avoid even more broken files).

> +
> >          if (par->codec_id == AV_CODEC_ID_RAWVIDEO) {
> >              uint32_t color_space = av_le2ne32(par->codec_tag);
> >              put_ebml_binary(pb, MATROSKA_ID_VIDEOCOLORSPACE,
> &color_space, sizeof(color_space));
>
> >
> > diff --git a/libavformat/matroskadec.c b/libavformat/matroskadec.c
> > index 8e1326abf6..de520be247 100644
> > --- a/libavformat/matroskadec.c
> > +++ b/libavformat/matroskadec.c
> > @@ -190,6 +190,10 @@ typedef struct MatroskaTrackVideo {
> >      uint64_t display_height;
> >      uint64_t pixel_width;
> >      uint64_t pixel_height;
> > +    uint64_t pixel_cropb;
> > +    uint64_t pixel_cropt;
> > +    uint64_t pixel_cropl;
> > +    uint64_t pixel_cropr;
> >      EbmlBin  color_space;
> >      uint64_t display_unit;
> >      uint64_t interlaced;
> > @@ -480,10 +484,10 @@ static EbmlSyntax matroska_track_video[] = {
> >      { MATROSKA_ID_VIDEOALPHAMODE,      EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, alpha_mode) },
> >      { MATROSKA_ID_VIDEOCOLOR,          EBML_NEST,
> sizeof(MatroskaTrackVideoColor), offsetof(MatroskaTrackVideo, color), { .n
> = matroska_track_video_color } },
> >      { MATROSKA_ID_VIDEOPROJECTION,     EBML_NEST,  0,
> offsetof(MatroskaTrackVideo, projection), { .n =
> matroska_track_video_projection } },
> > -    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_NONE },
> > -    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_NONE },
> > -    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_NONE },
> > -    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_NONE },
> > +    { MATROSKA_ID_VIDEOPIXELCROPB,     EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, pixel_cropb) },
> > +    { MATROSKA_ID_VIDEOPIXELCROPT,     EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, pixel_cropt) },
> > +    { MATROSKA_ID_VIDEOPIXELCROPL,     EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, pixel_cropl) },
> > +    { MATROSKA_ID_VIDEOPIXELCROPR,     EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, pixel_cropr) },
> >      { MATROSKA_ID_VIDEODISPLAYUNIT,    EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, display_unit), { .u=
> MATROSKA_VIDEO_DISPLAYUNIT_PIXELS } },
> >      { MATROSKA_ID_VIDEOFLAGINTERLACED, EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, interlaced),  { .u =
> MATROSKA_VIDEO_INTERLACE_FLAG_UNDETERMINED } },
> >      { MATROSKA_ID_VIDEOFIELDORDER,     EBML_UINT,  0,
> offsetof(MatroskaTrackVideo, field_order), { .u =
> MATROSKA_VIDEO_FIELDORDER_UNDETERMINED } },
> > @@ -2695,7 +2699,8 @@ static int matroska_parse_tracks(AVFormatContext
> *s)
> >              MatroskaTrackPlane *planes =
> track->operation.combine_planes.elem;
> >              int display_width_mul  = 1;
> >              int display_height_mul = 1;
> > -
> > +            const int cropped_width = track->video.pixel_width -
> track->video.pixel_cropl - track->video.pixel_cropr;
> > +            const int cropped_height = track->video.pixel_height -
> track->video.pixel_cropt - track->video.pixel_cropb;
>
> Missing check for bogus values in which the total crop is more than is
> available in that dimension. (Yes, I know, there are lots of missing
> checks already in this function. But let's not add another one.)
>
> >              st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
> >              st->codecpar->codec_tag  = fourcc;
> >              if (bit_depth >= 0)
> > @@ -2703,6 +2708,29 @@ static int matroska_parse_tracks(AVFormatContext
> *s)
> >              st->codecpar->width      = track->video.pixel_width;
> >              st->codecpar->height     = track->video.pixel_height;
> >
> > +            if (track->video.pixel_cropt || track->video.pixel_cropb ||
> > +                track->video.pixel_cropl || track->video.pixel_cropr) {
> > +                AVCleanAperture *metadata = av_clean_aperture_alloc();
> > +                if (!metadata)
> > +                    return AVERROR(ENOMEM);
> > +                metadata->width.num = cropped_width;
> > +                metadata->height.num = cropped_height;
> > +                metadata->width.den = 1;
> > +                metadata->height.den = 1;
>
> Can be aligned on '='.
>
> > +                av_reduce(&metadata->horizontal_offset.num,
> > +                          &metadata->horizontal_offset.den,
> > +                          (int)track->video.pixel_cropl -
> (int)track->video.pixel_cropr, 2, INT_MAX);
> > +                av_reduce(&metadata->vertical_offset.num,
> > +                          &metadata->vertical_offset.den,
> > +                          (int)track->video.pixel_cropt -
> (int)track->video.pixel_cropb, 2, INT_MAX);
> > +                ret = av_stream_add_side_data(st,
> AV_PKT_DATA_CLEAN_APERTURE,
> > +                                              (uint8_t *)metadata,
> sizeof(*metadata));
> > +                if (ret < 0) {
> > +                    av_freep(&metadata);
> > +                    return ret;
> > +                }
> > +            }
> > +
> >              if (track->video.interlaced ==
> MATROSKA_VIDEO_INTERLACE_FLAG_INTERLACED)
> >                  st->codecpar->field_order = mkv_field_order(matroska,
> track->video.field_order);
> >              else if (track->video.interlaced ==
> MATROSKA_VIDEO_INTERLACE_FLAG_PROGRESSIVE)
> > @@ -2714,8 +2742,8 @@ static int matroska_parse_tracks(AVFormatContext
> *s)
> >              if (track->video.display_unit <
> MATROSKA_VIDEO_DISPLAYUNIT_UNKNOWN) {
> >                  av_reduce(&st->sample_aspect_ratio.num,
> >                            &st->sample_aspect_ratio.den,
> > -                          st->codecpar->height *
> track->video.display_width  * display_width_mul,
> > -                          st->codecpar->width  *
> track->video.display_height * display_height_mul,
> > +                          cropped_height * track->video.display_width
> * display_width_mul,
> > +                          cropped_width  * track->video.display_height
> * display_height_mul,
>
> The display_width/height values used here might not be directly coded in
> the file, but inferred. And the default value depends upon the crop
> values, so this needs to be updated, too. Actually, if one has a
> non-anamorphic Matroska video with DisplayWidth/Height values explicitly
> coded and crop values set, then with your patches applied the demuxer
> would correctly demux it and the muxer would leave DisplayWidth/Height
> values out (so that it would still be non-anamorphic), but the demuxer
> would then infer that the new file is anamorphic (unless the ratio of
> the cropped and uncropped frames coincide, as usual).
>
> Thank you for tackling this. I always wanted to add support for this and
> would have done so if there had been a way to export the cropping
> information.
>

Thank you for your detailed and thorough review, Andreas. I'll be sure to
incorporate all your suggestions in the next revision.


>
Neil Birkbeck May 29, 2020, 9:41 p.m. UTC | #15
On Mon, May 11, 2020 at 9:37 PM Neil Birkbeck <neil.birkbeck@gmail.com>
wrote:

>
>
> On Wed, May 6, 2020 at 8:45 AM James Almer <jamrial@gmail.com> wrote:
>
>> On 5/6/2020 12:22 PM, Neil Birkbeck wrote:
>> > On Tue, May 5, 2020 at 5:11 AM Kieran O Leary <kieran.o.leary@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I broke the threading with my last reply, i apologise. Here goes
>> another
>> >> attempt:
>> >>
>> >> On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <neil.birkbeck@gmail.com
>> >
>> >> wrote:
>> >>
>> >>> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org>
>> wrote:
>> >>>
>> >>>> Andreas Rheinhardt (12020-04-28):
>> >>>>> That's expected. The patch provided only provides the structure in
>> >>> which
>> >>>>> the values are intended to be exported; it does not add any demuxing
>> >> or
>> >>>>> muxing capabilities for mov or mkv (as you can see from the fact
>> that
>> >>>>> none of these (de)muxers have been changed in the patch).
>> >>>>
>> >>>> Which is something I intended to comment on: adding code without
>> users
>> >>>> is rarely a good idea. I suggest we do not commit until at least one
>> >>>> demuxer use it, preferably at least two. Otherwise, we may realize
>> that
>> >>>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
>> >>>
>> >>>
>> >>> Thanks for the feedback. I also have patches for the demux (MOV/MKV)
>> and
>> >>> mux (MOV/MKV).
>> >>>
>> >>> As there is still the alternative of using the fields in the
>> >>> AVCodecParameters/AVCodecContext, my intention was to keep the first
>> >> patch
>> >>> small to resolve discussion on that point.
>> >>>
>> >>> I've included the patches, if you'd like to try test it, Kieren. I
>> see on
>> >>> your test file that there may be some slight rounding error making
>> output
>> >>> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} = 8).
>> >>>
>> >>> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
>> >>>     Side data:
>> >>>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
>> >>> v_offset:0/1]
>> >>> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy
>> /tmp/clap.mkv
>> >>> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
>> >>>     Side data:
>> >>>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1
>> v_offset:0/1]
>> >>>
>> >>
>> >> I have to look deeper into the MKV side of things and most likely
>> raise it
>> >> with the cellar mailing list so that better minds than mine can weigh
>> in. I
>> >> do see that the rounding up to 704 could be an issue alright.
>> >> As for MOV, your patch appears to generate the same output clap values
>> as
>> >> the input, so that's really great! command line and mediainfo trace
>> below:
>> >>
>> >
>> > Thanks for testing, Kieran and for linking the discussion on the cellar
>> > list.
>> >
>> > Any additional thoughts from ffmpeg devs on container-level SideData vs
>> > adding the extra fields into AVCodecParameters/AVCodecContext and
>> plumbing
>> > into the frame instead? I anticipate some situations where there can be
>> > interaction between cropping in bitstream and container-level cropping.
>> > Maybe the best way forward is for me to share some sample patches for
>> the
>> > alternative to validate whether it supports the various use cases
>> > (transmux, decode+crop, any other application level handling), and to
>> > confirm the interface changes for the structs.
>>
>> One option could be to also introduce a frame side data for this new
>> struct and have it replace the AVFrame fields, which would then be
>> deprecated (But of course keep working until removed).
>> This would allow to either inject stream side data from clap atoms and
>> Matroska crop fields into packets (Which would then be propagated to
>> frames to be applied during decoding), or use and export the bitstream
>> cropping information as it's the case right now, all while preventing
>> the addition of new fields to AVCodecParameters.
>>
>>
> I agree that sharing the SideData with the frame could be nice for the
> above reasons. I guess any interaction or conflict between bitstream &
> container cropping could then get resolved at the decoder (e.g., assuming
> that say SPS from h264 and container-level cropping should be composed).
>
>
>> I would like a developer that makes use of this feature to also comment,
>> especially seeing how the AVFrame fields and this clap side data are
>> pretty different.
>>
>
> The current CleanAperture representation was in part motivated to 1) keep
> the representation capable of representing the CLAP atom (same
> representation and rationals), and 2) to make it unambiguous that this was
> container-level stream metadata. The representation is a tiny bit more
> tedious when trying to actually perform a crop (e.g., to extract the
> top-left corner offset). If sharing as AVFrame side data, a representation
> closer to AVFrame's crop_{top,bottom,left,right} may be more natural.
>
> Within ffmpeg, I see that the existing codecs using AVFrame cropping
> include:
> -libavcodec/h264_slice.c: propagating the crop_fields from the SPS in h264
> in
> -libavcodec/hevc_refs.c: (sly to h264)
> -libavcodec/agm.c: crop_{left, top} inferred from
> avctx->coded_{width,height} - avctx->{width,height}
> -libavcodec/videotoolbox.c: crop fields explicitly set to zero.
> -libvacodec/vp3.c: crop_{left,top} set from coded bitstream’s offset_{x,y}
> (right,bottom from coded_{width,height})
>
> Upon review, I see there is one other format that appears to export
> similar cropping information into key/value pair metadata that could
> potentially also use the stream-level side data: (libavformat/cinedec.c:
> set_metadata_int(&st->metadata, "crop_left", avio_rl32(pb), 1))
>

I've changed the side data to be PixelCrop (instead of CleanAperture) given
the intent to reuse as cropping elsewhere.
-For now, I've kept the rational representation--although CLAP seems to be
the only required case of it. Maybe Kieran could comment on the requirement
of having maintaining the rationals for CLAP (only works on mov to mov
transmuxing).
Any other feedback on the representation?

To test crop on decode behavior, I've passed the metadata through to frame
data. Two minor observations:
1) With the current implementation of inject_global_side_data, the metadata
only makes it onto the first frame. Will need a special case to make sure
the metadata makes it onto every frame in libavformat/utils.c.
2) Cropping in libavutil/frame.c is adjusting frame pointers and by default
needs to respect alignment (unlike say the autorotate approach of adding a
vf_crop filter in ./fftools/ffmpeg_filter.c)

I also have updates to the demuxer/muxer patches, including incorporating
Andreas' MKV feedback, but will wait sending patches until we've resolved
the representation.
Kieran O Leary May 30, 2020, 10:41 a.m. UTC | #16
Hi,

On Fri 29 May 2020, 22:47 Neil Birkbeck, <neil.birkbeck@gmail.com> wrote:

> On Mon, May 11, 2020 at 9:37 PM Neil Birkbeck <neil.birkbeck@gmail.com>
> wrote:
>
> >
> >
> > On Wed, May 6, 2020 at 8:45 AM James Almer <jamrial@gmail.com> wrote:
> >
> >> On 5/6/2020 12:22 PM, Neil Birkbeck wrote:
> >> > On Tue, May 5, 2020 at 5:11 AM Kieran O Leary <
> kieran.o.leary@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I broke the threading with my last reply, i apologise. Here goes
> >> another
> >> >> attempt:
> >> >>
> >> >> On Tue, Apr 28, 2020 at 6:23 PM Neil Birkbeck <
> neil.birkbeck@gmail.com
> >> >
> >> >> wrote:
> >> >>
> >> >>> On Tue, Apr 28, 2020 at 3:18 AM Nicolas George <george@nsup.org>
> >> wrote:
> >> >>>
> >> >>>> Andreas Rheinhardt (12020-04-28):
> >> >>>>> That's expected. The patch provided only provides the structure in
> >> >>> which
> >> >>>>> the values are intended to be exported; it does not add any
> demuxing
> >> >> or
> >> >>>>> muxing capabilities for mov or mkv (as you can see from the fact
> >> that
> >> >>>>> none of these (de)muxers have been changed in the patch).
> >> >>>>
> >> >>>> Which is something I intended to comment on: adding code without
> >> users
> >> >>>> is rarely a good idea. I suggest we do not commit until at least
> one
> >> >>>> demuxer use it, preferably at least two. Otherwise, we may realize
> >> that
> >> >>>> “oh crap, it doesn't work” because of a tiny unforeseen detail.
> >> >>>
> >> >>>
> >> >>> Thanks for the feedback. I also have patches for the demux (MOV/MKV)
> >> and
> >> >>> mux (MOV/MKV).
> >> >>>
> >> >>> As there is still the alternative of using the fields in the
> >> >>> AVCodecParameters/AVCodecContext, my intention was to keep the first
> >> >> patch
> >> >>> small to resolve discussion on that point.
> >> >>>
> >> >>> I've included the patches, if you'd like to try test it, Kieren. I
> >> see on
> >> >>> your test file that there may be some slight rounding error making
> >> output
> >> >>> crop 704 not 703 (MKV file ends up with pixel_crop_{left,right} =
> 8).
> >> >>>
> >> >>> /ffprobe ../testdata/clap.mov 2>&1 | grep -A1 "Side"
> >> >>>     Side data:
> >> >>>       Clean aperture:[width 41472/59 height:576/1 h_offset:0/1
> >> >>> v_offset:0/1]
> >> >>> ./ffmpeg -i ../testdata/clap.mov  -vcodec copy -acodec copy
> >> /tmp/clap.mkv
> >> >>> ./ffprobe /tmp/clap.mkv 2>&1 | grep -A1 "Side"
> >> >>>     Side data:
> >> >>>       Clean aperture:[width 704/1 height:576/1 h_offset:0/1
> >> v_offset:0/1]
> >> >>>
> >> >>
> >> >> I have to look deeper into the MKV side of things and most likely
> >> raise it
> >> >> with the cellar mailing list so that better minds than mine can weigh
> >> in. I
> >> >> do see that the rounding up to 704 could be an issue alright.
> >> >> As for MOV, your patch appears to generate the same output clap
> values
> >> as
> >> >> the input, so that's really great! command line and mediainfo trace
> >> below:
> >> >>
> >> >
> >> > Thanks for testing, Kieran and for linking the discussion on the
> cellar
> >> > list.
> >> >
> >> > Any additional thoughts from ffmpeg devs on container-level SideData
> vs
> >> > adding the extra fields into AVCodecParameters/AVCodecContext and
> >> plumbing
> >> > into the frame instead? I anticipate some situations where there can
> be
> >> > interaction between cropping in bitstream and container-level
> cropping.
> >> > Maybe the best way forward is for me to share some sample patches for
> >> the
> >> > alternative to validate whether it supports the various use cases
> >> > (transmux, decode+crop, any other application level handling), and to
> >> > confirm the interface changes for the structs.
> >>
> >> One option could be to also introduce a frame side data for this new
> >> struct and have it replace the AVFrame fields, which would then be
> >> deprecated (But of course keep working until removed).
> >> This would allow to either inject stream side data from clap atoms and
> >> Matroska crop fields into packets (Which would then be propagated to
> >> frames to be applied during decoding), or use and export the bitstream
> >> cropping information as it's the case right now, all while preventing
> >> the addition of new fields to AVCodecParameters.
> >>
> >>
> > I agree that sharing the SideData with the frame could be nice for the
> > above reasons. I guess any interaction or conflict between bitstream &
> > container cropping could then get resolved at the decoder (e.g., assuming
> > that say SPS from h264 and container-level cropping should be composed).
> >
> >
> >> I would like a developer that makes use of this feature to also comment,
> >> especially seeing how the AVFrame fields and this clap side data are
> >> pretty different.
> >>
> >
> > The current CleanAperture representation was in part motivated to 1) keep
> > the representation capable of representing the CLAP atom (same
> > representation and rationals), and 2) to make it unambiguous that this
> was
> > container-level stream metadata. The representation is a tiny bit more
> > tedious when trying to actually perform a crop (e.g., to extract the
> > top-left corner offset). If sharing as AVFrame side data, a
> representation
> > closer to AVFrame's crop_{top,bottom,left,right} may be more natural.
> >
> > Within ffmpeg, I see that the existing codecs using AVFrame cropping
> > include:
> > -libavcodec/h264_slice.c: propagating the crop_fields from the SPS in
> h264
> > in
> > -libavcodec/hevc_refs.c: (sly to h264)
> > -libavcodec/agm.c: crop_{left, top} inferred from
> > avctx->coded_{width,height} - avctx->{width,height}
> > -libavcodec/videotoolbox.c: crop fields explicitly set to zero.
> > -libvacodec/vp3.c: crop_{left,top} set from coded bitstream’s
> offset_{x,y}
> > (right,bottom from coded_{width,height})
> >
> > Upon review, I see there is one other format that appears to export
> > similar cropping information into key/value pair metadata that could
> > potentially also use the stream-level side data: (libavformat/cinedec.c:
> > set_metadata_int(&st->metadata, "crop_left", avio_rl32(pb), 1))
> >
>
> I've changed the side data to be PixelCrop (instead of CleanAperture) given
> the intent to reuse as cropping elsewhere.
> -For now, I've kept the rational representation--although CLAP seems to be
> the only required case of it. Maybe Kieran could comment on the requirement
> of having maintaining the rationals for CLAP (only works on mov to mov
> transmuxing).
>
I'm no expert, but I think a lot of this just comes from video standards
that stipulate those rational numbers? I've cc'd tobias Rapp and Christoph
Gerstbauer of NOA just to bring this to their attention, as I'm pretty sure
that they use cropping values in AVI, so perhaps all of this could be
useful to them in some way.

K
Tobias Rapp June 8, 2020, 2:18 p.m. UTC | #17
On 30.05.2020 12:41, Kieran O Leary wrote:
> Hi,
> 
> On Fri 29 May 2020, 22:47 Neil Birkbeck, <neil.birkbeck@gmail.com> wrote:
> 
>> [...]
>> I've changed the side data to be PixelCrop (instead of CleanAperture) given
>> the intent to reuse as cropping elsewhere.
>> -For now, I've kept the rational representation--although CLAP seems to be
>> the only required case of it. Maybe Kieran could comment on the requirement
>> of having maintaining the rationals for CLAP (only works on mov to mov
>> transmuxing).
>>
> I'm no expert, but I think a lot of this just comes from video standards
> that stipulate those rational numbers? I've cc'd tobias Rapp and Christoph
> Gerstbauer of NOA just to bring this to their attention, as I'm pretty sure
> that they use cropping values in AVI, so perhaps all of this could be
> useful to them in some way.
> 

Hi Kieran,

when digitizing SD video carriers we indeed store some offset 
information in case VBI is preverved (i.e. PAL 720x608). But these 
offset values are currently not stored in the AVI container itself. The 
OpenDML "vprp" chunk defines some offset values but for the purpose of 
compressed image data where the codec implies some multiple-of-N 
height/width dimension on the data. So it did not seem to match our 
use-case.

Besides AVI and the mentioned MKV and MOV formats I remember some 
display offset/cropping information being stored in MXF with the Display 
X/Y-Offset values.

Regardless whether the frame crop/offset values are stored as frame 
fields or side data: how would this information be affected by filters 
like "crop" or "scale"?

Regards,
Tobias
Neil Birkbeck June 9, 2020, 3:44 p.m. UTC | #18
On Mon, Jun 8, 2020 at 7:19 AM Tobias Rapp <t.rapp@noa-archive.com> wrote:

> On 30.05.2020 12:41, Kieran O Leary wrote:
> > Hi,
> >
> > On Fri 29 May 2020, 22:47 Neil Birkbeck, <neil.birkbeck@gmail.com>
> wrote:
> >
> >> [...]
> >> I've changed the side data to be PixelCrop (instead of CleanAperture)
> given
> >> the intent to reuse as cropping elsewhere.
> >> -For now, I've kept the rational representation--although CLAP seems to
> be
> >> the only required case of it. Maybe Kieran could comment on the
> requirement
> >> of having maintaining the rationals for CLAP (only works on mov to mov
> >> transmuxing).
> >>
> > I'm no expert, but I think a lot of this just comes from video standards
> > that stipulate those rational numbers? I've cc'd tobias Rapp and
> Christoph
> > Gerstbauer of NOA just to bring this to their attention, as I'm pretty
> sure
> > that they use cropping values in AVI, so perhaps all of this could be
> > useful to them in some way.
> >
>
> Hi Kieran,
>
> when digitizing SD video carriers we indeed store some offset
> information in case VBI is preverved (i.e. PAL 720x608). But these
> offset values are currently not stored in the AVI container itself. The
> OpenDML "vprp" chunk defines some offset values but for the purpose of
> compressed image data where the codec implies some multiple-of-N
> height/width dimension on the data. So it did not seem to match our
> use-case.
>
> Besides AVI and the mentioned MKV and MOV formats I remember some
> display offset/cropping information being stored in MXF with the Display
> X/Y-Offset values.
>
> Regardless whether the frame crop/offset values are stored as frame
> fields or side data: how would this information be affected by filters
> like "crop" or "scale"?
>

Thanks for the additional info, Kieran and Tobias.

AVFrame already has cropping fields (AVFrame::crop_{top, bottom, left,
right}), and James Almer's suggestion was that this representation could
replace those. At the moment, when AVCodecContext::apply_cropping is true,
these offsets get applied (and zerod) in libavcodec/decode.c. Currently,
vf_crop adjusts these offsets for HWACCEL pix fmts, and vf_scale_vulkan
also respects (vf_scale doesn't).

Any additional reviewer thoughts on the SideData representation (e.g., crop
fields, currently rational to fully support the mov mux case)?
diff mbox series

Patch

diff --git a/libavcodec/avpacket.c b/libavcodec/avpacket.c
index 55b509108e..2ff779720b 100644
--- a/libavcodec/avpacket.c
+++ b/libavcodec/avpacket.c
@@ -397,6 +397,7 @@  const char *av_packet_side_data_name(enum AVPacketSideDataType type)
     case AV_PKT_DATA_AFD:                        return "Active Format Description data";
     case AV_PKT_DATA_ICC_PROFILE:                return "ICC Profile";
     case AV_PKT_DATA_DOVI_CONF:                  return "DOVI configuration record";
+    case AV_PKT_DATA_CLEAN_APERTURE:             return "Clean Aperture";
     }
     return NULL;
 }
diff --git a/libavcodec/packet.h b/libavcodec/packet.h
index 41485f4527..a0f8c29a33 100644
--- a/libavcodec/packet.h
+++ b/libavcodec/packet.h
@@ -282,6 +282,18 @@  enum AVPacketSideDataType {
      */
     AV_PKT_DATA_DOVI_CONF,
 
+    /**
+     * This side data contains a crop region (clean aperture) that defines the
+     * region of the stored image that should be cropped.
+     *
+     * It is intended to be used to store crop-related metadata from the
+     * container. For example, the CLAP atom in MOV files, and PixelCrop fields
+     * in MKV.
+     *
+     * The data is of type AVCleanAperture (see libavutil/clean_aperture.h)
+     */
+    AV_PKT_DATA_CLEAN_APERTURE,
+
     /**
      * The number of side data types.
      * This is not part of the public API/ABI in the sense that it may
diff --git a/libavformat/dump.c b/libavformat/dump.c
index 5e9a03185f..060e7b2d10 100644
--- a/libavformat/dump.c
+++ b/libavformat/dump.c
@@ -26,6 +26,7 @@ 
 #include "libavutil/display.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/log.h"
+#include "libavutil/clean_aperture.h"
 #include "libavutil/mastering_display_metadata.h"
 #include "libavutil/dovi_meta.h"
 #include "libavutil/mathematics.h"
@@ -402,6 +403,16 @@  static void dump_dovi_conf(void *ctx, AVPacketSideData* sd)
            dovi->dv_bl_signal_compatibility_id);
 }
 
+static void dump_clean_aperture(void *ctx, AVPacketSideData *sd)
+{
+    AVCleanAperture *clap = (AVCleanAperture *)sd->data;
+    av_log(ctx, AV_LOG_INFO, "[width %d/%d height:%d/%d h_offset:%d/%d v_offset:%d/%d]",
+           clap->width.num, clap->width.den,
+           clap->height.num, clap->height.den,
+           clap->horizontal_offset.num, clap->horizontal_offset.den,
+           clap->vertical_offset.num, clap->vertical_offset.den);
+}
+
 static void dump_sidedata(void *ctx, AVStream *st, const char *indent)
 {
     int i;
@@ -468,6 +479,10 @@  static void dump_sidedata(void *ctx, AVStream *st, const char *indent)
             av_log(ctx, AV_LOG_INFO, "DOVI configuration record: ");
             dump_dovi_conf(ctx, &sd);
             break;
+        case AV_PKT_DATA_CLEAN_APERTURE:
+            av_log(ctx, AV_LOG_INFO, "Clean aperture:");
+            dump_clean_aperture(ctx, &sd);
+            break;
         default:
             av_log(ctx, AV_LOG_INFO,
                    "unknown side data type %d (%d bytes)", sd.type, sd.size);
diff --git a/libavutil/Makefile b/libavutil/Makefile
index 966eec41aa..e9c9b2bff1 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -17,6 +17,7 @@  HEADERS = adler32.h                                                     \
           cast5.h                                                       \
           camellia.h                                                    \
           channel_layout.h                                              \
+          clean_aperture.h                                              \
           common.h                                                      \
           cpu.h                                                         \
           crc.h                                                         \
@@ -107,6 +108,7 @@  OBJS = adler32.o                                                        \
        cast5.o                                                          \
        camellia.o                                                       \
        channel_layout.o                                                 \
+       clean_aperture.o                                                 \
        color_utils.o                                                    \
        cpu.o                                                            \
        crc.o                                                            \
diff --git a/libavutil/clean_aperture.c b/libavutil/clean_aperture.c
new file mode 100644
index 0000000000..0be51725f3
--- /dev/null
+++ b/libavutil/clean_aperture.c
@@ -0,0 +1,30 @@ 
+/**
+ * Copyright (c) 2020 Neil Birkbeck <neil.birkbeck@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdint.h>
+#include <string.h>
+
+#include "clean_aperture.h"
+#include "mem.h"
+
+AVCleanAperture *av_clean_aperture_alloc(void)
+{
+    return av_mallocz(sizeof(AVCleanAperture));
+}
diff --git a/libavutil/clean_aperture.h b/libavutil/clean_aperture.h
new file mode 100644
index 0000000000..116cf36976
--- /dev/null
+++ b/libavutil/clean_aperture.h
@@ -0,0 +1,63 @@ 
+/*
+ * Copyright (c) 2020 Neil Birkbeck <neil.birkbeck@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_CLEAN_APERTURE_H
+#define AVUTIL_CLEAN_APERTURE_H
+
+#include "frame.h"
+#include "rational.h"
+
+
+/**
+ * Clean aperture represents the subset of stored pixels (as width, height
+ * and a center offset) that relates the pixels of the stored image to a
+ * canonical display area.
+ *
+ * The clean aperture structure is meant to be allocated as stream side data.
+ * It represents the Clean Aperture Atom (CLAP) and container-level cropping
+ * information (e.g., PixelCrop fields in MKV).
+ *
+ * @note The struct should be allocated with av_clean_aperture_alloc()
+ *       and its size is not a part of the public ABI.
+ */
+typedef struct AVCleanAperture {
+    // The width of the aperture window (<= stored image width)
+    AVRational width;
+
+    // The height of the aperture window (<= stored image height)
+    AVRational height;
+
+    // The horizontal offset of the center (relative to stored image center)
+    AVRational horizontal_offset;
+
+    // The vertical offset of the center (relative to stored image center)
+    AVRational vertical_offset;
+} AVCleanAperture;
+
+/**
+ * Allocate an AVCleanAperture structure and set its fields to
+ * default values. The resulting struct can be freed using av_freep().
+ *
+ * @return An AVCleanAperture filled with default values or NULL
+ *         on failure.
+ */
+AVCleanAperture *av_clean_aperture_alloc(void);
+
+#endif /* AVUTIL_CLEAN_APERTURE_H */