diff mbox series

[FFmpeg-devel,v2,2/4] ffmpeg: Add display_matrix option

Message ID 3476fe9f-0433-99fa-0f0f-451f4062a7c8@mail.de
State New
Headers show
Series [FFmpeg-devel,v2,1/4] fftools: Add support for dictionary options | expand

Checks

Context Check Description
andriy/configure_x86 warning Failed to apply patch
yinshiyou/configure_loongarch64 warning Failed to apply patch

Commit Message

Thilo Borgmann Aug. 15, 2022, 8:02 p.m. UTC
$subject

-Thilo
From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
Date: Mon, 15 Aug 2022 21:09:27 +0200
Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option

This enables overriding the rotation as well as horizontal/vertical
flip state of a specific video stream on the input side.

Additionally, switch the singular test that was utilizing the rotation
metadata to instead override the input display rotation, thus leading
to the same result.
---
 doc/ffmpeg.texi             |  13 +++++
 fftools/cmdutils.h          |   1 +
 fftools/ffmpeg.h            |   2 +
 fftools/ffmpeg_opt.c        | 107 ++++++++++++++++++++++++++++++++++++
 tests/fate/filter-video.mak |   2 +-
 5 files changed, 124 insertions(+), 1 deletion(-)

Comments

Gyan Doshi Aug. 16, 2022, 4:03 a.m. UTC | #1
On 2022-08-16 01:32 am, Thilo Borgmann wrote:
>
>
> +    struct display_matrix_s {
> +        const AVClass *class;
> +        double  rotation;
> +        int     hflip;
> +        int     vflip;
> +    };

There should be a scale option too since the matrix encodes that 
transform as well. Ref. ISO/IEC 14496-12:2015 6.2.2

Regards,
Gyan
Anton Khirnov Aug. 16, 2022, 2:10 p.m. UTC | #2
Quoting Thilo Borgmann (2022-08-15 22:02:09)
> $subject
> 
> -Thilo
> From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
> Date: Mon, 15 Aug 2022 21:09:27 +0200
> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
> 
> This enables overriding the rotation as well as horizontal/vertical
> flip state of a specific video stream on the input side.
> 
> Additionally, switch the singular test that was utilizing the rotation
> metadata to instead override the input display rotation, thus leading
> to the same result.
> ---

I still don't see how it's better to squash multiple options into a
single option.

It requires all this extra infrastructure and in the end it's less
user-friendly, because user-understandable things like rotation or flips
are now hidden under "display matrix". How many users would know what a
display matrix is?
Thilo Borgmann Aug. 16, 2022, 6:48 p.m. UTC | #3
Am 16.08.22 um 16:10 schrieb Anton Khirnov:
> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>> $subject
>>
>> -Thilo
>>  From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>
>> This enables overriding the rotation as well as horizontal/vertical
>> flip state of a specific video stream on the input side.
>>
>> Additionally, switch the singular test that was utilizing the rotation
>> metadata to instead override the input display rotation, thus leading
>> to the same result.
>> ---
> 
> I still don't see how it's better to squash multiple options into a
> single option.
> 
> It requires all this extra infrastructure and in the end it's less
> user-friendly, because user-understandable things like rotation or flips
> are now hidden under "display matrix". How many users would know what a
> display matrix is?

FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.

For the inexperienced user the use of individual filters would be the natural choice.

Though i don't care much about how it's done, I can adopt to what you guys finally agree on. Having a patch for AVDict options is worth it anyways (even just for future use).

-Thilo
Marton Balint Aug. 17, 2022, 6:26 a.m. UTC | #4
On Mon, 15 Aug 2022, Thilo Borgmann wrote:

> diff --git a/fftools/cmdutils.c b/fftools/cmdutils.c
> index 18e768b386..22ba654bb0 100644
> --- a/fftools/cmdutils.c
> +++ b/fftools/cmdutils.c
> @@ -131,6 +131,22 @@ int64_t parse_time_or_die(const char *context, const char *timestr,
>      return us;
>  }
> 
> +static AVDictionary *parse_dict_or_die(const char *context,
> +                                       const char *dict_str)
> +{
> +    AVDictionary *dict = NULL;
> +    int ret = av_dict_parse_string(&dict, dict_str, "=", ",", 0);

Please use the av_opt syntax for the dictionary for better consistency:
av_dict_parse_string(&options, val, "=", ":", 0)

> +    if (ret < 0) {
> +        av_log(NULL, AV_LOG_FATAL,
> +               "Failed to create a dictionary from '%s': %s!\n",
> +               dict_str, av_err2str(ret));
> +        exit_program(1);
> +    }
> +
> +
> +    return dict;
> +}
> +
>  void show_help_options(const OptionDef *options, const char *msg, int req_flags,
>                         int rej_flags, int alt_flags)
>  {
> @@ -288,6 +304,8 @@ static int write_option(void *optctx, const OptionDef *po, const char *opt,
>          *(float *)dst = parse_number_or_die(opt, arg, OPT_FLOAT, -INFINITY, INFINITY);
>      } else if (po->flags & OPT_DOUBLE) {
>          *(double *)dst = parse_number_or_die(opt, arg, OPT_DOUBLE, -INFINITY, INFINITY);
> +    } else if (po->flags & OPT_DICT) {
> +        *(AVDictionary **)dst = parse_dict_or_die(opt, arg);
>      } else if (po->u.func_arg) {
>          int ret = po->u.func_arg(optctx, opt, arg);
>          if (ret < 0) {
> diff --git a/fftools/cmdutils.h b/fftools/cmdutils.h
> index d87e162ccd..6a519c6546 100644
> --- a/fftools/cmdutils.h
> +++ b/fftools/cmdutils.h
> @@ -129,6 +129,7 @@ typedef struct SpecifierOpt {
>          uint64_t ui64;
>          float      f;
>          double   dbl;
> +        AVDictionary *dict;
>      } u;
>  } SpecifierOpt;
> 
> @@ -157,6 +158,7 @@ typedef struct OptionDef {
>  #define OPT_DOUBLE 0x20000
>  #define OPT_INPUT  0x40000
>  #define OPT_OUTPUT 0x80000
> +#define OPT_DICT  0x100000
>       union {
>          void *dst_ptr;
>          int (*func_arg)(void *, const char *, const char *);
> diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
> index 97f14b2a5b..cc038aae6b 100644
> --- a/fftools/ffmpeg_opt.c
> +++ b/fftools/ffmpeg_opt.c
> @@ -62,6 +62,7 @@
>  #define SPECIFIER_OPT_FMT_ui64 "%"PRIu64
>  #define SPECIFIER_OPT_FMT_f    "%f"
>  #define SPECIFIER_OPT_FMT_dbl  "%lf"
> +#define SPECIFIER_OPT_FMT_dict "%p"

The error message which uses this will make no sense. You should modify 
WARN_MULTIPLE_OPT_USAGE to make something sensible out of a dict.

E.g. make SPECIFIER_OPT_FMT_dict "%s" and create 
#define SPECIFIER_OPT_FUNC_dict as a dumper function. (the dumper function 
of other types can be #defined as identities)

Regards,
Marton
Anton Khirnov Aug. 17, 2022, 8:18 a.m. UTC | #5
Quoting Thilo Borgmann (2022-08-16 20:48:57)
> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
> > Quoting Thilo Borgmann (2022-08-15 22:02:09)
> >> $subject
> >>
> >> -Thilo
> >>  From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
> >> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
> >> Date: Mon, 15 Aug 2022 21:09:27 +0200
> >> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
> >>
> >> This enables overriding the rotation as well as horizontal/vertical
> >> flip state of a specific video stream on the input side.
> >>
> >> Additionally, switch the singular test that was utilizing the rotation
> >> metadata to instead override the input display rotation, thus leading
> >> to the same result.
> >> ---
> > 
> > I still don't see how it's better to squash multiple options into a
> > single option.
> > 
> > It requires all this extra infrastructure and in the end it's less
> > user-friendly, because user-understandable things like rotation or flips
> > are now hidden under "display matrix". How many users would know what a
> > display matrix is?
> 
> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.

I don't.

It may be one thing internally, but modeling user interfaces based on
internal representation is a sinful malpractice. More importantly, I see
no advantage from doing it - it only makes the option parsing more
complicated.
Gyan Doshi Aug. 17, 2022, 8:50 a.m. UTC | #6
On 2022-08-17 01:48 pm, Anton Khirnov wrote:
> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>> $subject
>>>>
>>>> -Thilo
>>>>   From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>
>>>> This enables overriding the rotation as well as horizontal/vertical
>>>> flip state of a specific video stream on the input side.
>>>>
>>>> Additionally, switch the singular test that was utilizing the rotation
>>>> metadata to instead override the input display rotation, thus leading
>>>> to the same result.
>>>> ---
>>> I still don't see how it's better to squash multiple options into a
>>> single option.
>>>
>>> It requires all this extra infrastructure and in the end it's less
>>> user-friendly, because user-understandable things like rotation or flips
>>> are now hidden under "display matrix". How many users would know what a
>>> display matrix is?
>> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
> I don't.
>
> It may be one thing internally, but modeling user interfaces based on
> internal representation is a sinful malpractice. More importantly, I see
> no advantage from doing it - it only makes the option parsing more
> complicated.

It's not based on ffmpeg's 'internal representation'. All transform 
attributes are stored as a composite in one mathematical object.
Evaluating the matrix values will need to look at all sources of 
contribution. So gathering and presenting all these attributes in a single
option (+ docs) makes it clearer to the user at the cost of an initial 
learning curve.

Regards,
Gyan
Nicolas George Aug. 17, 2022, 8:59 a.m. UTC | #7
Gyan Doshi (12022-08-17):
> It's not based on ffmpeg's 'internal representation'. All transform
> attributes are stored as a composite in one mathematical object.
> Evaluating the matrix values will need to look at all sources of
> contribution. So gathering and presenting all these attributes in a single
> option (+ docs) makes it clearer to the user at the cost of an initial
> learning curve.

I concur a single option might be more convenient. Especially since our
options system does not take into account the order of options: the
interactions between multiple options would be rather hard to explain.

OTOH, I do not like a dictionary-based approach, for the same reason:
you have to explain the order of precedence of options, and how
contradicting ones interact.

Might I suggest to adopt the syntax of the transform attribute of SVG?
Or a subset of it with a stricter syntax.

https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform

It requires writing a specific parser, but one that can be done with
sscanf():

if (sscanf(cur, "translate(%d %d)%n", &dx, &dy, &off) >= 2 && off) {
    translate_current_matrix(&mat, dx, dy);
    cur += off;
}

Regards,
Anton Khirnov Aug. 17, 2022, 9:05 a.m. UTC | #8
Quoting Gyan Doshi (2022-08-17 10:50:43)
> 
> 
> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
> > Quoting Thilo Borgmann (2022-08-16 20:48:57)
> >> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
> >>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
> >>>> $subject
> >>>>
> >>>> -Thilo
> >>>>   From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
> >>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
> >>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
> >>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
> >>>>
> >>>> This enables overriding the rotation as well as horizontal/vertical
> >>>> flip state of a specific video stream on the input side.
> >>>>
> >>>> Additionally, switch the singular test that was utilizing the rotation
> >>>> metadata to instead override the input display rotation, thus leading
> >>>> to the same result.
> >>>> ---
> >>> I still don't see how it's better to squash multiple options into a
> >>> single option.
> >>>
> >>> It requires all this extra infrastructure and in the end it's less
> >>> user-friendly, because user-understandable things like rotation or flips
> >>> are now hidden under "display matrix". How many users would know what a
> >>> display matrix is?
> >> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
> > I don't.
> >
> > It may be one thing internally, but modeling user interfaces based on
> > internal representation is a sinful malpractice. More importantly, I see
> > no advantage from doing it - it only makes the option parsing more
> > complicated.
> 
> It's not based on ffmpeg's 'internal representation'. All transform 
> attributes are stored as a composite in one mathematical object.

Keyword "stored". It is internal representation. Users should not care
how it is stored, the entire point point of our project is to shield
them from that as much as possible.

> Evaluating the matrix values will need to look at all sources of 
> contribution. So gathering and presenting all these attributes in a single
> option (+ docs) makes it clearer to the user at the cost of an initial 
> learning curve.

Are you seriously expecting all users who want to mark a video as
rotated or flipped to learn about display matrices?
Nothing whatsoever about this is clearer. Just document in what order
these options take effect (which has to be done anyway), and you're
done. No need to inflict pointless complexity on people.
Gyan Doshi Aug. 17, 2022, 10:53 a.m. UTC | #9
On 2022-08-17 02:35 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>
>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>> $subject
>>>>>>
>>>>>> -Thilo
>>>>>>    From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>
>>>>>> This enables overriding the rotation as well as horizontal/vertical
>>>>>> flip state of a specific video stream on the input side.
>>>>>>
>>>>>> Additionally, switch the singular test that was utilizing the rotation
>>>>>> metadata to instead override the input display rotation, thus leading
>>>>>> to the same result.
>>>>>> ---
>>>>> I still don't see how it's better to squash multiple options into a
>>>>> single option.
>>>>>
>>>>> It requires all this extra infrastructure and in the end it's less
>>>>> user-friendly, because user-understandable things like rotation or flips
>>>>> are now hidden under "display matrix". How many users would know what a
>>>>> display matrix is?
>>>> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
>>> I don't.
>>>
>>> It may be one thing internally, but modeling user interfaces based on
>>> internal representation is a sinful malpractice. More importantly, I see
>>> no advantage from doing it - it only makes the option parsing more
>>> complicated.
>> It's not based on ffmpeg's 'internal representation'. All transform
>> attributes are stored as a composite in one mathematical object.
> Keyword "stored". It is internal representation. Users should not care
> how it is stored, the entire point point of our project is to shield
> them from that as much as possible.
>
>> Evaluating the matrix values will need to look at all sources of
>> contribution. So gathering and presenting all these attributes in a single
>> option (+ docs) makes it clearer to the user at the cost of an initial
>> learning curve.
> Are you seriously expecting all users who want to mark a video as
> rotated or flipped to learn about display matrices?

They don't need to know how to encode or decode the matrix if they don't 
want to. Only that it is the container.

The difference is between

  -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2

and

  -display_matrix:v:0 rotate=90:hflip=1:scale=2

The latter syntax is all too familiar to users from AVFrame filters and 
BSFs. There's hardly any overhead here.
They google or do Ctrl-F on ffmpeg.html for rotate, and they get to the 
display matrix option entry.

Regards,
Gyan
Anton Khirnov Aug. 17, 2022, 12:25 p.m. UTC | #10
Quoting Gyan Doshi (2022-08-17 12:53:11)
> 
> 
> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
> > Quoting Gyan Doshi (2022-08-17 10:50:43)
> >>
> >> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
> >>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
> >>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
> >>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
> >>>>>> $subject
> >>>>>>
> >>>>>> -Thilo
> >>>>>>    From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
> >>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
> >>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
> >>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
> >>>>>>
> >>>>>> This enables overriding the rotation as well as horizontal/vertical
> >>>>>> flip state of a specific video stream on the input side.
> >>>>>>
> >>>>>> Additionally, switch the singular test that was utilizing the rotation
> >>>>>> metadata to instead override the input display rotation, thus leading
> >>>>>> to the same result.
> >>>>>> ---
> >>>>> I still don't see how it's better to squash multiple options into a
> >>>>> single option.
> >>>>>
> >>>>> It requires all this extra infrastructure and in the end it's less
> >>>>> user-friendly, because user-understandable things like rotation or flips
> >>>>> are now hidden under "display matrix". How many users would know what a
> >>>>> display matrix is?
> >>>> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
> >>> I don't.
> >>>
> >>> It may be one thing internally, but modeling user interfaces based on
> >>> internal representation is a sinful malpractice. More importantly, I see
> >>> no advantage from doing it - it only makes the option parsing more
> >>> complicated.
> >> It's not based on ffmpeg's 'internal representation'. All transform
> >> attributes are stored as a composite in one mathematical object.
> > Keyword "stored". It is internal representation. Users should not care
> > how it is stored, the entire point point of our project is to shield
> > them from that as much as possible.
> >
> >> Evaluating the matrix values will need to look at all sources of
> >> contribution. So gathering and presenting all these attributes in a single
> >> option (+ docs) makes it clearer to the user at the cost of an initial
> >> learning curve.
> > Are you seriously expecting all users who want to mark a video as
> > rotated or flipped to learn about display matrices?
> 
> They don't need to know how to encode or decode the matrix if they don't 
> want to. Only that it is the container.
> 
> The difference is between
> 
>   -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
> 
> and
> 
>   -display_matrix:v:0 rotate=90:hflip=1:scale=2
> 
> The latter syntax is all too familiar to users from AVFrame filters and 
> BSFs.

The syntax similarity is misleading - filters are applied in the order
you list them, while these options are always applied in fixed order.
The analogous filters are also called rotate, [vf]flip, and scale -
there is no display_matrix filter.
Anton Khirnov Aug. 18, 2022, 7:11 a.m. UTC | #11
Quoting Gyan Doshi (2022-08-17 10:50:43)
> 
> 
> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
> > Quoting Thilo Borgmann (2022-08-16 20:48:57)
> >> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
> >>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
> >>>> $subject
> >>>>
> >>>> -Thilo
> >>>>   From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
> >>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
> >>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
> >>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
> >>>>
> >>>> This enables overriding the rotation as well as horizontal/vertical
> >>>> flip state of a specific video stream on the input side.
> >>>>
> >>>> Additionally, switch the singular test that was utilizing the rotation
> >>>> metadata to instead override the input display rotation, thus leading
> >>>> to the same result.
> >>>> ---
> >>> I still don't see how it's better to squash multiple options into a
> >>> single option.
> >>>
> >>> It requires all this extra infrastructure and in the end it's less
> >>> user-friendly, because user-understandable things like rotation or flips
> >>> are now hidden under "display matrix". How many users would know what a
> >>> display matrix is?
> >> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
> > I don't.
> >
> > It may be one thing internally, but modeling user interfaces based on
> > internal representation is a sinful malpractice. More importantly, I see
> > no advantage from doing it - it only makes the option parsing more
> > complicated.
> 
> It's not based on ffmpeg's 'internal representation'. All transform 
> attributes are stored as a composite in one mathematical object.

Also forgot to mention yesterday - the way this will be stored in the
output depends on the codec/container and will not always necessarily be
an MP4-style display matrix. E.g. in some cases it might make most sense
to write a display orientation SEI, which stores [hv]flip flags and a
rotation angle.
Gyan Doshi Aug. 18, 2022, 10:58 a.m. UTC | #12
On 2022-08-17 05:55 pm, Anton Khirnov wrote:
> Quoting Gyan Doshi (2022-08-17 12:53:11)
>>
>> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
>>> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>>>> $subject
>>>>>>>>
>>>>>>>> -Thilo
>>>>>>>>     From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 00:00:00 2001
>>>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
>>>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>>>
>>>>>>>> This enables overriding the rotation as well as horizontal/vertical
>>>>>>>> flip state of a specific video stream on the input side.
>>>>>>>>
>>>>>>>> Additionally, switch the singular test that was utilizing the rotation
>>>>>>>> metadata to instead override the input display rotation, thus leading
>>>>>>>> to the same result.
>>>>>>>> ---
>>>>>>> I still don't see how it's better to squash multiple options into a
>>>>>>> single option.
>>>>>>>
>>>>>>> It requires all this extra infrastructure and in the end it's less
>>>>>>> user-friendly, because user-understandable things like rotation or flips
>>>>>>> are now hidden under "display matrix". How many users would know what a
>>>>>>> display matrix is?
>>>>>> FWIW I think Gyan's request to do this all in one option that effect one thing (the display matrix) is valid.
>>>>> I don't.
>>>>>
>>>>> It may be one thing internally, but modeling user interfaces based on
>>>>> internal representation is a sinful malpractice. More importantly, I see
>>>>> no advantage from doing it - it only makes the option parsing more
>>>>> complicated.
>>>> It's not based on ffmpeg's 'internal representation'. All transform
>>>> attributes are stored as a composite in one mathematical object.
>>> Keyword "stored". It is internal representation. Users should not care
>>> how it is stored, the entire point point of our project is to shield
>>> them from that as much as possible.
>>>
>>>> Evaluating the matrix values will need to look at all sources of
>>>> contribution. So gathering and presenting all these attributes in a single
>>>> option (+ docs) makes it clearer to the user at the cost of an initial
>>>> learning curve.
>>> Are you seriously expecting all users who want to mark a video as
>>> rotated or flipped to learn about display matrices?
>> They don't need to know how to encode or decode the matrix if they don't
>> want to. Only that it is the container.
>>
>> The difference is between
>>
>>    -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
>>
>> and
>>
>>    -display_matrix:v:0 rotate=90:hflip=1:scale=2
>>
>> The latter syntax is all too familiar to users from AVFrame filters and
>> BSFs.
> The syntax similarity is misleading - filters are applied in the order
> you list them, while these options are always applied in fixed order.
> The analogous filters are also called rotate, [vf]flip, and scale -
> there is no display_matrix filter.

The display matrix is effected as a single matrix multiplication to 
obtain output pixel co-ordinates which incorporates all the
encoded transforms so it is analogous to multiple options within a 
filter like eq or hue, not multiple filters.

About SEI messaging,  the h264 metadata BSF still obtains (and extracts) 
those attributes as a display matrix as that is the internal messaging 
format regardless of ultimate storage form.

Regards,
Gyan
Thilo Borgmann Aug. 20, 2022, 1:32 p.m. UTC | #13
On 18 Aug 2022, at 12:58, Gyan Doshi wrote:

> On 2022-08-17 05:55 pm, Anton Khirnov wrote:
>> Quoting Gyan Doshi (2022-08-17 12:53:11)
>>>
>>> On 2022-08-17 02:35 pm, Anton Khirnov wrote:
>>>> Quoting Gyan Doshi (2022-08-17 10:50:43)
>>>>> On 2022-08-17 01:48 pm, Anton Khirnov wrote:
>>>>>> Quoting Thilo Borgmann (2022-08-16 20:48:57)
>>>>>>> Am 16.08.22 um 16:10 schrieb Anton Khirnov:
>>>>>>>> Quoting Thilo Borgmann (2022-08-15 22:02:09)
>>>>>>>>> $subject
>>>>>>>>>
>>>>>>>>> -Thilo
>>>>>>>>>     From fe2ff114cb004f897c7774753d9cf28298eba82d Mon Sep 17 
>>>>>>>>> 00:00:00 2001
>>>>>>>>> From: =?UTF-8?q?Jan=20Ekstr=C3=B6m?= <jeebjp@gmail.com>
>>>>>>>>> Date: Mon, 15 Aug 2022 21:09:27 +0200
>>>>>>>>> Subject: [PATCH v2 2/4] ffmpeg: Add display_matrix option
>>>>>>>>>
>>>>>>>>> This enables overriding the rotation as well as 
>>>>>>>>> horizontal/vertical
>>>>>>>>> flip state of a specific video stream on the input side.
>>>>>>>>>
>>>>>>>>> Additionally, switch the singular test that was utilizing the 
>>>>>>>>> rotation
>>>>>>>>> metadata to instead override the input display rotation, thus 
>>>>>>>>> leading
>>>>>>>>> to the same result.
>>>>>>>>> ---
>>>>>>>> I still don't see how it's better to squash multiple options 
>>>>>>>> into a
>>>>>>>> single option.
>>>>>>>>
>>>>>>>> It requires all this extra infrastructure and in the end it's 
>>>>>>>> less
>>>>>>>> user-friendly, because user-understandable things like rotation 
>>>>>>>> or flips
>>>>>>>> are now hidden under "display matrix". How many users would 
>>>>>>>> know what a
>>>>>>>> display matrix is?
>>>>>>> FWIW I think Gyan's request to do this all in one option that 
>>>>>>> effect one thing (the display matrix) is valid.
>>>>>> I don't.
>>>>>>
>>>>>> It may be one thing internally, but modeling user interfaces 
>>>>>> based on
>>>>>> internal representation is a sinful malpractice. More 
>>>>>> importantly, I see
>>>>>> no advantage from doing it - it only makes the option parsing 
>>>>>> more
>>>>>> complicated.
>>>>> It's not based on ffmpeg's 'internal representation'. All 
>>>>> transform
>>>>> attributes are stored as a composite in one mathematical object.
>>>> Keyword "stored". It is internal representation. Users should not 
>>>> care
>>>> how it is stored, the entire point point of our project is to 
>>>> shield
>>>> them from that as much as possible.
>>>>
>>>>> Evaluating the matrix values will need to look at all sources of
>>>>> contribution. So gathering and presenting all these attributes in 
>>>>> a single
>>>>> option (+ docs) makes it clearer to the user at the cost of an 
>>>>> initial
>>>>> learning curve.
>>>> Are you seriously expecting all users who want to mark a video as
>>>> rotated or flipped to learn about display matrices?
>>> They don't need to know how to encode or decode the matrix if they 
>>> don't
>>> want to. Only that it is the container.
>>>
>>> The difference is between
>>>
>>>    -rotate:v:0 90 -hflip:v:0 1 -scale:v:0 2
>>>
>>> and
>>>
>>>    -display_matrix:v:0 rotate=90:hflip=1:scale=2
>>>
>>> The latter syntax is all too familiar to users from AVFrame filters 
>>> and
>>> BSFs.
>> The syntax similarity is misleading - filters are applied in the 
>> order
>> you list them, while these options are always applied in fixed order.
>> The analogous filters are also called rotate, [vf]flip, and scale -
>> there is no display_matrix filter.
>
> The display matrix is effected as a single matrix multiplication to 
> obtain output pixel co-ordinates which incorporates all the
> encoded transforms so it is analogous to multiple options within a 
> filter like eq or hue, not multiple filters.
>
> About SEI messaging,  the h264 metadata BSF still obtains (and 
> extracts) those attributes as a display matrix as that is the internal 
> messaging format regardless of ultimate storage form.

Thanks for all your comments! Unfortunately, I don’t see real 
consensus emerging here.

I see a single option finds more acceptance (3:1),
I see using to use AVDict is frowned upon by 1, though the alternative 
suggestion with a parser for SVG-style (new syntax)  is not backup up by 
s.o. else.

Therefore my interpretation would be to go with majority and stick with 
one option and stick with AVDict.
However, I don’t want to shortcut the discussion or override s.o. 
opinion. Going to pick this up next week and if no more arguments 
emerge, I’ll continue with that.

Thanks,
Thilo
Nicolas George Aug. 20, 2022, 1:39 p.m. UTC | #14
Thilo Borgman (12022-08-20):
> suggestion with a parser for SVG-style (new syntax)  is not backup up by
> s.o. else.

I feel it was somewhat drowned by the rest of the discussion. Do YOU
like it?

Regards,
Thilo Borgmann Aug. 20, 2022, 1:48 p.m. UTC | #15
Am 20.08.22 um 15:39 schrieb Nicolas George:
> Thilo Borgman (12022-08-20):
>> suggestion with a parser for SVG-style (new syntax)  is not backup up by
>> s.o. else.
> 
> I feel it was somewhat drowned by the rest of the discussion. Do YOU
> like it?

My two cents about it that the a=b:c=d syntax from AVDict is at least known and used in filters already.
The function style a(b,c,d) thing from SVG would be completely new. Instead of the AVDict overhead, it adds a simplistic parser overhead.
Also, maybe I'm just unaware, these style of options appears not to be often used in the command line world.

But as I said I usually don't touch ffmpeg / it's options, so I think my opinion should not be of much value here where we talk of the contextual effects of this patch.

-Thilo
Nicolas George Aug. 22, 2022, 12:30 p.m. UTC | #16
Thilo Borgman (12022-08-20):
> My two cents about it that the a=b:c=d syntax from AVDict is at least
> known and used in filters already.
> The function style a(b,c,d) thing from SVG would be completely new.
> Instead of the AVDict overhead, it adds a simplistic parser overhead.
> Also, maybe I'm just unaware, these style of options appears not to be
> often used in the command line world.

These are good points. On the other hand:

- The "a=b:c=d" syntax needs documentation and is misleading with regard
  the order and interaction of suboptions.

- The SVG syntax is more powerful, it allows to compose several
  transforms.

Regards,
Thilo Borgmann Sept. 7, 2022, 4:05 p.m. UTC | #17
Am 22.08.22 um 14:30 schrieb Nicolas George:
> Thilo Borgman (12022-08-20):
>> My two cents about it that the a=b:c=d syntax from AVDict is at least
>> known and used in filters already.
>> The function style a(b,c,d) thing from SVG would be completely new.
>> Instead of the AVDict overhead, it adds a simplistic parser overhead.
>> Also, maybe I'm just unaware, these style of options appears not to be
>> often used in the command line world.
> 
> These are good points. On the other hand:

> - The "a=b:c=d" syntax needs documentation and is misleading with regard
>    the order and interaction of suboptions.

Yes needs documentation. I don't see a misleading point in terms of the syntax, just in the mathematical sense - which gets resolved by documenting it.
Syntax-wise, it aligns more to the dont-care order of options to filters (where this syntax is taken from). The only two exceptions are -vf / -filter_complex where the order is relevant. All other options to filters are not respecting order, AFAICT and this would be a new exception to respect the order of a=b:c=d options.


> - The SVG syntax is more powerful, it allows to compose several
>    transforms.

Then we'd have to do two things, add a completely new syntax (which comes with new overhead) and a new scheme of math and order-respecting composition of the final matrix, which can then be many many simple transformations (which in our use case will rarely be needed). Where instead we would reuse known syntax and could get away with the relatively simple decomposition into three sequentially applied filters.

However, I can see that our optimal solution should actually involve a new filter that applies pixel disposition by a 3x3 matrix (which we don't have yet, do we?) and an order respecting syntax (of either kind, though even more overhead with SVG syntax) so that we can skip matrix decomposition and apply just one (hopefully efficient) filter for any transformation that comes via input or cmd line. That would be much more work to ask for IMHO and I'm not a total fan for that just fixing #8329, #6370.

I don't want to override your opinion just because others argued to be happy with a (technically better) version of the proposed. Give me more reason and/or a matrix transform filter and my internal barrier (and real-world limitations) to go that full length in one step might drop. Until then I'll work on v3 which has to be done either way.

Thanks!
-Thilo
diff mbox series

Patch

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index 42440d93b4..5d3e3b3052 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -912,6 +912,19 @@  If used together with @option{-vcodec copy}, it will affect the aspect ratio
 stored at container level, but not the aspect ratio stored in encoded
 frames, if it exists.
 
+@item -display_matrix[:@var{stream_specifier}] @var{opt1=val1[,opt2=val2]...} (@emph{input,per-stream})
+Set the video display matrix according to given options.
+
+@table @option
+@item rotation=@var{number}
+Set the rotation using a floating point number that describes a pure
+counter-clockwise rotation in degrees.
+The @code{-autorotate} logic will be affected.
+@item hflip=@var{[0,1]}
+@item vflip=@var{[0,1]}
+Set a horizontal or vertical flip.
+@end table
+
 @item -vn (@emph{input/output})
 As an input option, blocks all video streams of a file from being filtered or
 being automatically selected or mapped for any output. See @code{-discard}
diff --git a/fftools/cmdutils.h b/fftools/cmdutils.h
index 6a519c6546..df90cc6958 100644
--- a/fftools/cmdutils.h
+++ b/fftools/cmdutils.h
@@ -166,6 +166,7 @@  typedef struct OptionDef {
     } u;
     const char *help;
     const char *argname;
+    const AVClass *args;
 } OptionDef;
 
 /**
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index 6991ba7632..0ea730fd42 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -193,6 +193,8 @@  typedef struct OptionsContext {
     int        nb_force_fps;
     SpecifierOpt *frame_aspect_ratios;
     int        nb_frame_aspect_ratios;
+    SpecifierOpt *display_matrixes;
+    int        nb_display_matrixes;
     SpecifierOpt *rc_overrides;
     int        nb_rc_overrides;
     SpecifierOpt *intra_matrices;
diff --git a/fftools/ffmpeg_opt.c b/fftools/ffmpeg_opt.c
index cc038aae6b..e184b4239c 100644
--- a/fftools/ffmpeg_opt.c
+++ b/fftools/ffmpeg_opt.c
@@ -20,6 +20,7 @@ 
 
 #include "config.h"
 
+#include <float.h>
 #include <stdint.h>
 
 #if HAVE_SYS_RESOURCE_H
@@ -45,6 +46,7 @@ 
 #include "libavutil/avutil.h"
 #include "libavutil/bprint.h"
 #include "libavutil/channel_layout.h"
+#include "libavutil/display.h"
 #include "libavutil/getenv_utf8.h"
 #include "libavutil/intreadwrite.h"
 #include "libavutil/fifo.h"
@@ -87,6 +89,7 @@  static const char *const opt_name_forced_key_frames[]         = {"forced_key_fra
 static const char *const opt_name_fps_mode[]                  = {"fps_mode", NULL};
 static const char *const opt_name_force_fps[]                 = {"force_fps", NULL};
 static const char *const opt_name_frame_aspect_ratios[]       = {"aspect", NULL};
+static const char *const opt_name_display_matrixes[]          = {"display_matrix", NULL};
 static const char *const opt_name_rc_overrides[]              = {"rc_override", NULL};
 static const char *const opt_name_intra_matrices[]            = {"intra_matrix", NULL};
 static const char *const opt_name_inter_matrices[]            = {"inter_matrix", NULL};
@@ -112,6 +115,32 @@  static const char *const opt_name_time_bases[]                = {"time_base", NU
 static const char *const opt_name_enc_time_bases[]            = {"enc_time_base", NULL};
 static const char *const opt_name_bits_per_raw_sample[]       = {"bits_per_raw_sample", NULL};
 
+// XXX this should probably go into a seperate file <name>_args.c and #included here
+    struct display_matrix_s {
+        const AVClass *class;
+        double  rotation;
+        int     hflip;
+        int     vflip;
+    };
+#define OFFSET(x) offsetof(struct display_matrix_s, x)
+    static const AVOption display_matrix_args[] = {
+        { "rotation", "set rotation", OFFSET(rotation), AV_OPT_TYPE_DOUBLE,
+            { .dbl = DBL_MAX }, -(DBL_MAX), DBL_MAX - 1.0f, AV_OPT_FLAG_ARGUMENT},
+        { "hflip",    "set hflip", OFFSET(hflip),    AV_OPT_TYPE_BOOL,
+            { .i64 = -1 }, 0, 1, AV_OPT_FLAG_ARGUMENT},
+        { "vflip",    "set vflip", OFFSET(vflip),    AV_OPT_TYPE_BOOL,
+            { .i64 = -1 }, 0, 1, AV_OPT_FLAG_ARGUMENT},
+        { NULL },
+    };
+    static const AVClass class_display_matrix_args = {
+        .class_name = "display_matrix_args",
+        .item_name  = av_default_item_name,
+        .option     = display_matrix_args,
+        .version    = LIBAVUTIL_VERSION_INT,
+    };
+#undef OFFSET
+// XXX
+
 #define WARN_MULTIPLE_OPT_USAGE(name, type, so, st)\
 {\
     char namestr[128] = "";\
@@ -808,6 +837,75 @@  static int opt_recording_timestamp(void *optctx, const char *opt, const char *ar
     return 0;
 }
 
+static void add_display_matrix_to_stream(OptionsContext *o,
+                                         AVFormatContext *ctx, AVStream *st)
+{
+    int hflip_set = 0, vflip_set = 0, display_rotation_set = 0;
+    uint8_t *buf = NULL;
+
+    static struct display_matrix_s test_args = {
+        .class    = &class_display_matrix_args,
+        .rotation = DBL_MAX,
+        .hflip    = -1,
+        .vflip    = -1,
+    };
+
+    AVDictionary *global_args = NULL;
+    AVDictionary *local_args  = NULL;
+    AVDictionaryEntry *en = NULL;
+
+    MATCH_PER_STREAM_OPT(display_matrixes, dict, global_args, ctx, st);
+
+    if (!global_args)
+        return;
+
+    // make a copy of the dict so it doesn't get freed from underneath us
+    if (av_dict_copy(&local_args, global_args, 0) < 0) {
+        av_log(NULL, AV_LOG_FATAL,
+               "Failed to copy argument dict for display matrix!\n");
+    }
+
+    if (av_opt_set_dict2(&test_args, &local_args, 0) < 0) {
+        av_log(NULL, AV_LOG_FATAL,
+               "Failed to set options for a display matrix!\n");
+        exit_program(1);
+    }
+
+    while ((en = av_dict_get(local_args, "", en, AV_DICT_IGNORE_SUFFIX))) {
+        av_log(NULL, AV_LOG_FATAL,
+               "Unknown option=value pair for display matrix: "
+               "key: '%s', value: '%s'!\n",
+               en->key, en->value);
+    }
+
+    if (av_dict_count(local_args)) {
+        exit_program(1);
+    }
+
+    av_dict_free(&local_args);
+
+    display_rotation_set = test_args.rotation != DBL_MAX;
+    hflip_set            = test_args.hflip != -1;
+    vflip_set            = test_args.vflip != -1;
+
+    if (!display_rotation_set && !hflip_set && !vflip_set)
+        return;
+
+    if (!(buf = av_stream_new_side_data(st, AV_PKT_DATA_DISPLAYMATRIX,
+                                        sizeof(int32_t) * 9))) {
+        av_log(NULL, AV_LOG_FATAL, "Failed to generate a display matrix!\n");
+        exit_program(1);
+    }
+
+    av_display_rotation_set((int32_t *)buf,
+                            display_rotation_set ? -(test_args.rotation) :
+                                                   -0.0f);
+    av_display_matrix_flip((int32_t *)buf,
+                           hflip_set ? test_args.hflip : 0,
+                           vflip_set ? test_args.vflip : 0);
+}
+
+
 static const AVCodec *find_codec_or_die(const char *name, enum AVMediaType type, int encoder)
 {
     const AVCodecDescriptor *desc;
@@ -942,6 +1040,8 @@  static void add_input_streams(OptionsContext *o, AVFormatContext *ic)
         }
 
         if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
+            add_display_matrix_to_stream(o, ic, st);
+
             MATCH_PER_STREAM_OPT(hwaccels, str, hwaccel, ic, st);
             MATCH_PER_STREAM_OPT(hwaccel_output_formats, str,
                                  hwaccel_output_format, ic, st);
@@ -1865,6 +1965,8 @@  static OutputStream *new_video_stream(OptionsContext *o, AVFormatContext *oc, in
         ost->frame_aspect_ratio = q;
     }
 
+    add_display_matrix_to_stream(o, oc, st);
+
     MATCH_PER_STREAM_OPT(filter_scripts, str, ost->filters_script, oc, st);
     MATCH_PER_STREAM_OPT(filters,        str, ost->filters,        oc, st);
 
@@ -3992,6 +4094,11 @@  const OptionDef options[] = {
     { "aspect",       OPT_VIDEO | HAS_ARG  | OPT_STRING | OPT_SPEC |
                       OPT_OUTPUT,                                                { .off = OFFSET(frame_aspect_ratios) },
         "set aspect ratio (4:3, 16:9 or 1.3333, 1.7777)", "aspect" },
+    { "display_matrix", OPT_VIDEO | HAS_ARG | OPT_DICT | OPT_SPEC |
+                        OPT_INPUT,                                              { .off = OFFSET(display_matrixes) },
+        "define a display matrix with rotation and/or horizontal/vertical "
+        "flip for stream(s)",
+        "arguments", &class_display_matrix_args },
     { "pix_fmt",      OPT_VIDEO | HAS_ARG | OPT_EXPERT  | OPT_STRING | OPT_SPEC |
                       OPT_INPUT | OPT_OUTPUT,                                    { .off = OFFSET(frame_pix_fmts) },
         "set pixel format", "format" },
diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 372c70bba7..763390ea51 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -691,7 +691,7 @@  fate-filter-metadata-avf-aphase-meter-out-of-phase: SRC = $(TARGET_SAMPLES)/filt
 fate-filter-metadata-avf-aphase-meter-out-of-phase: CMD = run $(FILTER_METADATA_COMMAND) "amovie='$(SRC)',aphasemeter=video=0"
 
 FATE_FILTER_SAMPLES-$(call TRANSCODE, RAWVIDEO H264, MOV, ARESAMPLE_FILTER  AAC_FIXED_DECODER) += fate-filter-meta-4560-rotate0
-fate-filter-meta-4560-rotate0: CMD = transcode mov $(TARGET_SAMPLES)/filter/sample-in-issue-505.mov mov "-c copy -metadata:s:v:0 rotate=0" "-af aresample" "" "" "-flags +bitexact -c:a aac_fixed"
+fate-filter-meta-4560-rotate0: CMD = transcode "mov -display_matrix:v:0 rotation=0" $(TARGET_SAMPLES)/filter/sample-in-issue-505.mov mov "-c copy" "-af aresample" "" "" "-flags +bitexact -c:a aac_fixed"
 
 FATE_FILTER_CMP_METADATA-$(CONFIG_BLOCKDETECT_FILTER) += fate-filter-refcmp-blockdetect-yuv
 fate-filter-refcmp-blockdetect-yuv: CMD = cmp_metadata blockdetect yuv420p 0.015