diff mbox

[FFmpeg-devel] deduplicated [PATCH] Cinepak: speed up decoding several-fold, depending on the scenario, by supporting multiple output pixel formats.

Message ID 20170211212145.GO1516@example.net
State New
Headers show

Commit Message

u-9iep@aetey.se Feb. 11, 2017, 9:25 p.m. UTC
Hello,

This is my best effort attempt to make the patch acceptable
by the upstream's criteria.

Daniel, do you mind that I referred to your message in the commit?
I believe is is best to indicate numbers from a third party measurement.

The code seems to be equvalent to the previous patch,
with about 20% less LOC.

This hurts readability (my subjective impression) but on the positive side
the change makes the structure of the code more explicit.

Attaching the patch.

Now I have done what I can, have to leave.
Unless there are bugs there in the patch, my attempt to contribute ends
at this point.

Thanks to everyone who cared to objectively discuss a specific case
of ffmpeg usage, the implications of techniques around VQ and whether/why
some non-traditional approaches can make sense.

Good luck to the ffmpeg project, it is very useful and valuable.

Best regards,
Rune
From 0c9badec5d144b995c0bb52c7a80939b672be3f5 Mon Sep 17 00:00:00 2001
From: Rl <addr-see-the-website@aetey.se>
Date: Sat, 11 Feb 2017 20:28:54 +0100
Subject: [PATCH] Cinepak: speed up decoding several-fold, depending on the
 scenario, by supporting multiple output pixel formats.

Decoding to rgb24 and pal8 is optimized.

Added rgb32, rgb565, yuv420p, each with faster decoding than to rgb24.

The most noticeable gain is achieved by the created possibility
to skip format conversions, for example when decoding to rgb565
----
Using matrixbench_mpeg2.mpg (720x567) encoded with ffmpeg into Cinepak
using default settings, decoding on an i5 3570K, 3.4 GHz:

bicubic (default):          ~24x realtime
fast_bilinear:              ~65x realtime
patch w/rgb565 override:    ~154x realtime
----
(https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/206799.html)

palettized input can be decoded to any of the output formats,
pal8 output is still limited to palettized input

with input other than palettized/grayscale
yuv420 is approximated by the Cinepak colorspace

The output format can be chosen at runtime by an option or via the API.
---
 libavcodec/cinepak.c | 844 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 692 insertions(+), 152 deletions(-)

Comments

Michael Niedermayer Feb. 13, 2017, 1:19 p.m. UTC | #1
On Sat, Feb 11, 2017 at 10:25:03PM +0100, u-9iep@aetey.se wrote:
> Hello,
> 
> This is my best effort attempt to make the patch acceptable
> by the upstream's criteria.
> 
> Daniel, do you mind that I referred to your message in the commit?
> I believe is is best to indicate numbers from a third party measurement.
> 
> The code seems to be equvalent to the previous patch,
> with about 20% less LOC.
> 
> This hurts readability (my subjective impression) but on the positive side
> the change makes the structure of the code more explicit.
> 
> Attaching the patch.
> 
> Now I have done what I can, have to leave.
> Unless there are bugs there in the patch, my attempt to contribute ends
> at this point.
> 
> Thanks to everyone who cared to objectively discuss a specific case
> of ffmpeg usage, the implications of techniques around VQ and whether/why
> some non-traditional approaches can make sense.
> 
> Good luck to the ffmpeg project, it is very useful and valuable.
> 
> Best regards,
> Rune

>  cinepak.c |  844 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 692 insertions(+), 152 deletions(-)
> cc2ab45b7633651bc0ff80ca57c78ef4fc649d3c  0001-Cinepak-speed-up-decoding-several-fold-depending-on-.patch
> From 0c9badec5d144b995c0bb52c7a80939b672be3f5 Mon Sep 17 00:00:00 2001
> From: Rl <addr-see-the-website@aetey.se>
> Date: Sat, 11 Feb 2017 20:28:54 +0100
> Subject: [PATCH] Cinepak: speed up decoding several-fold, depending on the
>  scenario, by supporting multiple output pixel formats.
> 
> Decoding to rgb24 and pal8 is optimized.
> 
> Added rgb32, rgb565, yuv420p, each with faster decoding than to rgb24.
> 
> The most noticeable gain is achieved by the created possibility
> to skip format conversions, for example when decoding to rgb565
> ----
> Using matrixbench_mpeg2.mpg (720x567) encoded with ffmpeg into Cinepak
> using default settings, decoding on an i5 3570K, 3.4 GHz:
> 
> bicubic (default):          ~24x realtime
> fast_bilinear:              ~65x realtime
> patch w/rgb565 override:    ~154x realtime
> ----
> (https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/206799.html)
> 
> palettized input can be decoded to any of the output formats,
> pal8 output is still limited to palettized input
> 
> with input other than palettized/grayscale
> yuv420 is approximated by the Cinepak colorspace
> 
> The output format can be chosen at runtime by an option or via the API.
> ---
>  libavcodec/cinepak.c | 844 +++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 692 insertions(+), 152 deletions(-)

you may want to add yourself to MAINTAINERs (after talking with
roberto, who i belive has less interrest in cinepak than you do
nowadays)


> 
> diff --git a/libavcodec/cinepak.c b/libavcodec/cinepak.c
> index d657e9c0c1..7b08e20e06 100644
> --- a/libavcodec/cinepak.c
> +++ b/libavcodec/cinepak.c
> @@ -31,6 +31,8 @@
>   *
>   * Cinepak colorspace support (c) 2013 Rl, Aetey Global Technologies AB
>   * @author Cinepak colorspace, Rl, Aetey Global Technologies AB
> + * Extra output formats / optimizations (c) 2017 Rl, Aetey Global Technologies AB
> + * @author Extra output formats / optimizations, Rl, Aetey Global Technologies AB
>   */
>  
>  #include <stdio.h>
> @@ -39,23 +41,48 @@
>  
>  #include "libavutil/common.h"
>  #include "libavutil/intreadwrite.h"
> +#include "libavutil/opt.h"

> +/* #include "libavutil/avassert.h" */

useless commented out code

[...]
> +    switch (avctx->pix_fmt) {
> +    case AV_PIX_FMT_RGB32:
> +        s->decode_codebook = cinepak_decode_codebook_rgb32;
> +        s->decode_vectors  = cinepak_decode_vectors_rgb32;
> +        break;
> +    case AV_PIX_FMT_RGB24:
> +        s->decode_codebook = cinepak_decode_codebook_rgb24;
> +        s->decode_vectors  = cinepak_decode_vectors_rgb24;
> +        break;
> +    case AV_PIX_FMT_RGB565:
> +        s->decode_codebook = cinepak_decode_codebook_rgb565;
> +        s->decode_vectors  = cinepak_decode_vectors_rgb565;
> +        break;
> +    case AV_PIX_FMT_YUV420P:
> +        s->decode_codebook = cinepak_decode_codebook_yuv420;
> +        s->decode_vectors  = cinepak_decode_vectors_yuv420;
> +        break;
> +    case AV_PIX_FMT_PAL8:
> +        if (!s->palette_video) {
> +            av_log(avctx, AV_LOG_ERROR, "Palettized output not supported without palettized input\n");
> +            return AVERROR(EINVAL);
> +        }
> +        s->decode_codebook = cinepak_decode_codebook_pal8;
> +        s->decode_vectors  = cinepak_decode_vectors_pal8;
> +        break;
> +    default:

> +        av_log(avctx, AV_LOG_ERROR, "Unsupported pixel format %d\n", avctx->pix_fmt);

av_get_pix_fmt_name()


[...]

> @@ -488,4 +1026,6 @@ AVCodec ff_cinepak_decoder = {
>      .close          = cinepak_decode_end,
>      .decode         = cinepak_decode_frame,
>      .capabilities   = AV_CODEC_CAP_DR1,
> +    .pix_fmts       = pixfmt_list,

This is possibly unneeded

thx

[...]
Paul B Mahol Feb. 13, 2017, 1:23 p.m. UTC | #2
On 2/13/17, Michael Niedermayer <michael@niedermayer.cc> wrote:
> On Sat, Feb 11, 2017 at 10:25:03PM +0100, u-9iep@aetey.se wrote:
>> Hello,
>>
>> This is my best effort attempt to make the patch acceptable
>> by the upstream's criteria.
>>
>> Daniel, do you mind that I referred to your message in the commit?
>> I believe is is best to indicate numbers from a third party measurement.
>>
>> The code seems to be equvalent to the previous patch,
>> with about 20% less LOC.
>>
>> This hurts readability (my subjective impression) but on the positive
>> side
>> the change makes the structure of the code more explicit.
>>
>> Attaching the patch.
>>
>> Now I have done what I can, have to leave.
>> Unless there are bugs there in the patch, my attempt to contribute ends
>> at this point.
>>
>> Thanks to everyone who cared to objectively discuss a specific case
>> of ffmpeg usage, the implications of techniques around VQ and whether/why
>> some non-traditional approaches can make sense.
>>
>> Good luck to the ffmpeg project, it is very useful and valuable.
>>
>> Best regards,
>> Rune
>
>>  cinepak.c |  844
>> ++++++++++++++++++++++++++++++++++++++++++++++++++------------
>>  1 file changed, 692 insertions(+), 152 deletions(-)
>> cc2ab45b7633651bc0ff80ca57c78ef4fc649d3c
>> 0001-Cinepak-speed-up-decoding-several-fold-depending-on-.patch
>> From 0c9badec5d144b995c0bb52c7a80939b672be3f5 Mon Sep 17 00:00:00 2001
>> From: Rl <addr-see-the-website@aetey.se>
>> Date: Sat, 11 Feb 2017 20:28:54 +0100
>> Subject: [PATCH] Cinepak: speed up decoding several-fold, depending on
>> the
>>  scenario, by supporting multiple output pixel formats.
>>
>> Decoding to rgb24 and pal8 is optimized.
>>
>> Added rgb32, rgb565, yuv420p, each with faster decoding than to rgb24.
>>
>> The most noticeable gain is achieved by the created possibility
>> to skip format conversions, for example when decoding to rgb565
>> ----
>> Using matrixbench_mpeg2.mpg (720x567) encoded with ffmpeg into Cinepak
>> using default settings, decoding on an i5 3570K, 3.4 GHz:
>>
>> bicubic (default):          ~24x realtime
>> fast_bilinear:              ~65x realtime
>> patch w/rgb565 override:    ~154x realtime
>> ----
>> (https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/206799.html)
>>
>> palettized input can be decoded to any of the output formats,
>> pal8 output is still limited to palettized input
>>
>> with input other than palettized/grayscale
>> yuv420 is approximated by the Cinepak colorspace
>>
>> The output format can be chosen at runtime by an option or via the API.
>> ---
>>  libavcodec/cinepak.c | 844
>> +++++++++++++++++++++++++++++++++++++++++----------
>>  1 file changed, 692 insertions(+), 152 deletions(-)
>
> you may want to add yourself to MAINTAINERs (after talking with
> roberto, who i belive has less interrest in cinepak than you do
> nowadays)
>
>
>>
>> diff --git a/libavcodec/cinepak.c b/libavcodec/cinepak.c
>> index d657e9c0c1..7b08e20e06 100644
>> --- a/libavcodec/cinepak.c
>> +++ b/libavcodec/cinepak.c
>> @@ -31,6 +31,8 @@
>>   *
>>   * Cinepak colorspace support (c) 2013 Rl, Aetey Global Technologies AB
>>   * @author Cinepak colorspace, Rl, Aetey Global Technologies AB
>> + * Extra output formats / optimizations (c) 2017 Rl, Aetey Global
>> Technologies AB
>> + * @author Extra output formats / optimizations, Rl, Aetey Global
>> Technologies AB
>>   */
>>
>>  #include <stdio.h>
>> @@ -39,23 +41,48 @@
>>
>>  #include "libavutil/common.h"
>>  #include "libavutil/intreadwrite.h"
>> +#include "libavutil/opt.h"
>
>> +/* #include "libavutil/avassert.h" */
>
> useless commented out code
>
> [...]
>> +    switch (avctx->pix_fmt) {
>> +    case AV_PIX_FMT_RGB32:
>> +        s->decode_codebook = cinepak_decode_codebook_rgb32;
>> +        s->decode_vectors  = cinepak_decode_vectors_rgb32;
>> +        break;
>> +    case AV_PIX_FMT_RGB24:
>> +        s->decode_codebook = cinepak_decode_codebook_rgb24;
>> +        s->decode_vectors  = cinepak_decode_vectors_rgb24;
>> +        break;
>> +    case AV_PIX_FMT_RGB565:
>> +        s->decode_codebook = cinepak_decode_codebook_rgb565;
>> +        s->decode_vectors  = cinepak_decode_vectors_rgb565;
>> +        break;
>> +    case AV_PIX_FMT_YUV420P:
>> +        s->decode_codebook = cinepak_decode_codebook_yuv420;
>> +        s->decode_vectors  = cinepak_decode_vectors_yuv420;
>> +        break;
>> +    case AV_PIX_FMT_PAL8:
>> +        if (!s->palette_video) {
>> +            av_log(avctx, AV_LOG_ERROR, "Palettized output not supported
>> without palettized input\n");
>> +            return AVERROR(EINVAL);
>> +        }
>> +        s->decode_codebook = cinepak_decode_codebook_pal8;
>> +        s->decode_vectors  = cinepak_decode_vectors_pal8;
>> +        break;
>> +    default:
>
>> +        av_log(avctx, AV_LOG_ERROR, "Unsupported pixel format %d\n",
>> avctx->pix_fmt);
>
> av_get_pix_fmt_name()
>
>
> [...]
>
>> @@ -488,4 +1026,6 @@ AVCodec ff_cinepak_decoder = {
>>      .close          = cinepak_decode_end,
>>      .decode         = cinepak_decode_frame,
>>      .capabilities   = AV_CODEC_CAP_DR1,
>> +    .pix_fmts       = pixfmt_list,
>
> This is possibly unneeded
>
> thx
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Many things microsoft did are stupid, but not doing something just because
> microsoft did it is even more stupid. If everything ms did were stupid they
> would be bankrupt already.
>

Is this new mini swscale?
u-9iep@aetey.se Feb. 13, 2017, 1:41 p.m. UTC | #3
Thanks Michael,

Your corrections are appreciated.

On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> you may want to add yourself to MAINTAINERs (after talking with
> roberto, who i belive has less interrest in cinepak than you do
> nowadays)

Sounds ok for me. Roberto, what do you think (if you read this)?

> > +/* #include "libavutil/avassert.h" */
> 
> useless commented out code

I hesitated but left it together with the corresponding commented out
assert() statement to serve as an indication of the validity assumption
we make for pal8. Will change.

> > +        av_log(avctx, AV_LOG_ERROR, "Unsupported pixel format %d\n", avctx->pix_fmt);
> 
> av_get_pix_fmt_name()

Thanks.

> > @@ -488,4 +1026,6 @@ AVCodec ff_cinepak_decoder = {
> >      .close          = cinepak_decode_end,
> >      .decode         = cinepak_decode_frame,
> >      .capabilities   = AV_CODEC_CAP_DR1,
> > +    .pix_fmts       = pixfmt_list,
> 
> This is possibly unneeded

ok!

Regards,
Rune
u-9iep@aetey.se Feb. 13, 2017, 5:51 p.m. UTC | #4
On Mon, Feb 13, 2017 at 02:23:47PM +0100, Paul B Mahol wrote:
> >> @@ -488,4 +1026,6 @@ AVCodec ff_cinepak_decoder = {
> >>      .close          = cinepak_decode_end,
> >>      .decode         = cinepak_decode_frame,
> >>      .capabilities   = AV_CODEC_CAP_DR1,
> >> +    .pix_fmts       = pixfmt_list,

> > [...]

> Is this new mini swscale?

Generality of swscaler is traded away to be able to fuse the conversion
with the decoder. In this special case this is remarkably fruitful,
while definitely not applicable "en masse".

If there were at least another practical codec with similar properties,
it might be nice to refactor such "codebook-targeting format conversion"
to make it reusable. It would still be much smaller in scope than swscaler
and hence could be optimized more specifically.

I would like to find a rival to Cinepak and would happily do such a
refactoring (if applicable) for the sake of another fast-decodable codec,
but nothing I know of comes near enough to be practical.

Well, while we are at it, "general purpose" vs "game" codec aside,

RoQ decoding could be possibly made faster by building a full second
level codebook in advance and avoiding the double indirection later.
(This looks like what the "dreamroq" decoder does btw.)

While I do not expect that RoQ can be made faster than Cinepak (it
is more complex at decoding), it can be optimized against pixel format
conversions in the same way so it may be worth to give this a try.

Then abstracting a "mini-swscale" could become justifiable.

Regards,
Rune
wm4 Feb. 14, 2017, 5:51 a.m. UTC | #5
On Mon, 13 Feb 2017 18:51:39 +0100
u-9iep@aetey.se wrote:

> Then abstracting a "mini-swscale" could become justifiable.

And this is why we should probably reject this patch. What you wrote
paints a horrifying future.

Note that we would have this discussion even if it'd speed up the h264
decoder. Pissing all over modularization is not a good thing to do.

If you really want to get anything applied, you should probably try
looking at outputting ycgco, which appears to be the native colorspace
of the codec (and convert it vf_colorspace, I guess).
u-9iep@aetey.se Feb. 14, 2017, 8:51 a.m. UTC | #6
On Tue, Feb 14, 2017 at 06:51:46AM +0100, wm4 wrote:
> On Mon, 13 Feb 2017 18:51:39 +0100
> u-9iep@aetey.se wrote:
> 
> > Then abstracting a "mini-swscale" could become justifiable.
> 
> And this is why we should probably reject this patch.
> What you wrote paints a horrifying future.
--^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Note that we would have this discussion even if it'd speed up the h264
> decoder. Pissing all over modularization is not a good thing to do.
-----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> If you really want to get anything applied, you should probably try
> looking at outputting ycgco, which appears to be the native colorspace
> of the codec (and convert it vf_colorspace, I guess).

Dear wm4,

Your readiness to give feedback and your endeavor to keep the high quality
of the code are appreciated. Nevertheless:

I kindly ask you to mind your use of disparaging terms
(emotionally charged expressions like "horrifying" or "pissing"
which attribute a negative quality or attitude to your opponent),
the corresponding phrases are marked above for your reference

and please check your data.

For additional information I would suggest

 https://ffmpeg.org/developer.html#Code-of-conduct

 https://multimedia.cx/mirror/cinepak.txt

 the contents of this thread

Regards,
Rune
wm4 Feb. 14, 2017, 9 a.m. UTC | #7
On Tue, 14 Feb 2017 09:51:54 +0100
u-9iep@aetey.se wrote:

> On Tue, Feb 14, 2017 at 06:51:46AM +0100, wm4 wrote:
> > On Mon, 13 Feb 2017 18:51:39 +0100
> > u-9iep@aetey.se wrote:
> >   
> > > Then abstracting a "mini-swscale" could become justifiable.  
> > 
> > And this is why we should probably reject this patch.
> > What you wrote paints a horrifying future.  
> --^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > Note that we would have this discussion even if it'd speed up the h264
> > decoder. Pissing all over modularization is not a good thing to do.  
> -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > If you really want to get anything applied, you should probably try
> > looking at outputting ycgco, which appears to be the native colorspace
> > of the codec (and convert it vf_colorspace, I guess).  
> 
> Dear wm4,
> 
> Your readiness to give feedback and your endeavor to keep the high quality
> of the code are appreciated. Nevertheless:
> 
> I kindly ask you to mind your use of disparaging terms
> (emotionally charged expressions like "horrifying" or "pissing"
> which attribute a negative quality or attitude to your opponent),
> the corresponding phrases are marked above for your reference
> 
> and please check your data.
> 
> For additional information I would suggest
> 
>  https://ffmpeg.org/developer.html#Code-of-conduct
> 
>  https://multimedia.cx/mirror/cinepak.txt
> 
>  the contents of this thread

Nevertheless your patches are rejected.
Paul B Mahol Feb. 14, 2017, 9:03 a.m. UTC | #8
On 2/14/17, wm4 <nfxjfg@googlemail.com> wrote:
> On Tue, 14 Feb 2017 09:51:54 +0100
> u-9iep@aetey.se wrote:
>
>> On Tue, Feb 14, 2017 at 06:51:46AM +0100, wm4 wrote:
>> > On Mon, 13 Feb 2017 18:51:39 +0100
>> > u-9iep@aetey.se wrote:
>> >
>> > > Then abstracting a "mini-swscale" could become justifiable.
>> >
>> > And this is why we should probably reject this patch.
>> > What you wrote paints a horrifying future.
>> --^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >
>> > Note that we would have this discussion even if it'd speed up the h264
>> > decoder. Pissing all over modularization is not a good thing to do.
>> -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >
>> > If you really want to get anything applied, you should probably try
>> > looking at outputting ycgco, which appears to be the native colorspace
>> > of the codec (and convert it vf_colorspace, I guess).
>>
>> Dear wm4,
>>
>> Your readiness to give feedback and your endeavor to keep the high quality
>> of the code are appreciated. Nevertheless:
>>
>> I kindly ask you to mind your use of disparaging terms
>> (emotionally charged expressions like "horrifying" or "pissing"
>> which attribute a negative quality or attitude to your opponent),
>> the corresponding phrases are marked above for your reference
>>
>> and please check your data.
>>
>> For additional information I would suggest
>>
>>  https://ffmpeg.org/developer.html#Code-of-conduct
>>
>>  https://multimedia.cx/mirror/cinepak.txt
>>
>>  the contents of this thread
>
> Nevertheless your patches are rejected.

Correct way in solving this is outputing in cinepak decoder actual
native format that it
uses and not do any conversions of colorspace which is currently done.
Implement correct colorspace conversions of cinepak format to others
in libswscale.
u-9iep@aetey.se Feb. 14, 2017, 11:14 a.m. UTC | #9
On Tue, Feb 14, 2017 at 10:03:41AM +0100, Paul B Mahol wrote:
> Correct way in solving this is outputing in cinepak decoder actual
> native format that it
> uses and not do any conversions of colorspace which is currently done.
> Implement correct colorspace conversions of cinepak format to others
> in libswscale.

Would you comment on what is the difference between now and the situation

 https://ffmpeg.org/pipermail/ffmpeg-devel/2013-February/138811.html

when you (?) wrote on the same topic

-----------
> i think hes asking if it should be a new colorspace in swscale, or
> added into the cinepak decoder/encoder common code.
>
> by your answer i think it has to be a new swscale colorspace?

Please leave libswscale alone.
-----------

That position was IMHO reasonable.

Commenting on your current suggestion:

What would be the advantage of moving the Cinepak-specific conversions
into the general purpose library? Either this would mean as many
variations of the conversion code as there are in the decoder now or
the conversion would be done through some intermediate format, which is
a loss per se. The code still only would be used by Cinepak.

Besides this, for a purely technical reason the conversions done in
Cinepak are not suitable for delayed delegation to swscaler.

The reason has been already pointed out in the thread:

The huge difference in the amount of the data to be processed; in other
words the very essence of the vector quantization technology where frame
data is represented by a codebook, by design meant to be much smaller.

(BTW mathematically DCT to the contrary is equivalent to referencing an
extremely large codebook. This codebook does not have to be transmitted
or stored, neither of which would be practical. Instead its entries are
generated on-demand, for each frame data block. This is the situation
where swscale is an efficient tool.)

Hope this helps.

Regards,
Rune
Ronald S. Bultje Feb. 14, 2017, 2:46 p.m. UTC | #10
Hi Rune,

On Tue, Feb 14, 2017 at 6:14 AM, <u-9iep@aetey.se> wrote:

> The huge difference in the amount of the data to be processed; in other
> words the very essence of the vector quantization technology where frame
> data is represented by a codebook, by design meant to be much smaller.


We acknowledge that. We understand this. Nobody disputes this.

But we still don't think breaking the modularization is the right way
forward. I'm sorry. We're thinking about this in terms of maintainability
as well as speed. The problem is that once we allow this, people will ask
for 16bit output in h264 for all native bitdepths, or even packed formats
(Kieran already asked).

We know it's faster. We also know it's unreasonable from a maintenance
perspective. We have to draw a line somewhere. It's an imperfect line but
there's a reason for it. I'm sorry. That's life.

Ronald
u-9iep@aetey.se Feb. 15, 2017, 11:01 a.m. UTC | #11
Hi Ronald,

On Tue, Feb 14, 2017 at 09:46:55AM -0500, Ronald S. Bultje wrote:
> > The huge difference in the amount of the data to be processed; in other
> > words the very essence of the vector quantization technology where frame
> > data is represented by a codebook, by design meant to be much smaller.
> 
> 
> We acknowledge that. We understand this. Nobody disputes this.

Just several lines below you assume (why?) that using this specific advantage
in Cinepak would create any ground for a "corresponding change" in h264.
As far as I know h264 does not use vector quantization on the final
output data, or what do you mean by the following:

> But we still don't think breaking the modularization is the right way
> forward. I'm sorry. We're thinking about this in terms of maintainability
> as well as speed. The problem is that once we allow this, people will ask
> for 16bit output in h264 for all native bitdepths, or even packed formats
> (Kieran already asked).

Unfortunately I do not see this as having any real relevance.
If there indeed is a gain to be collected for h264, it should be weighed
against the cost to be caused by a change _there_.

As a matter of fact, I do not suggest "breaking modularity in ffmpeg".

Modularity, a very useful concept, like any other concept has its area
of usefulness. An approach very reasonable in most situations should
not be mistaken for "the best one in all cases".

Here we have a case where enforced application of this generally useful
concept is remarkably far from optimal.

> We know it's faster. We also know it's unreasonable from a maintenance

You do not know the latter.
Your guess possibly reflects the feeling of a fundamental concept
being neglected, but this in not a case of neglection/ignorance.

> perspective. We have to draw a line somewhere. It's an imperfect line but
> there's a reason for it. I'm sorry. That's life.

You do not have to be sorry, just be substantial.

I have checked the changes done to Cinepak decoder during the 4 years
since it began to decode correctly.

If the change being discussed now would have been applied back then,
one of the later commits would have had about 15 extra lines to change
(btw, in a very regular fashion: "frame." => "frame->" nothing else).

Another commit would have to change 5 lines instead of 1.

The latter indicated indeed an unnecessary duplication in variable
declarations, sorry for that. I now change the patch to avoid this.

That's all. Tell me if I missed something.

The decoder is used rarely but it is indispensable when maximal speed
is needed. There is no substitute.

Is a 3-fold improvement in the decoding speed worth 15 extra lines
to change once in 4 years?

Regards,
Rune
u-9iep@aetey.se March 5, 2017, 7:22 p.m. UTC | #12
On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
> On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> > you may want to add yourself to MAINTAINERs (after talking with
> > roberto, who i belive has less interrest in cinepak than you do
> > nowadays)
> 
> Sounds ok for me. Roberto, what do you think (if you read this)?

The only address to him which I found (in an old commit) bounced,
there was no reply here on the list either.

Both can be a coincidence, but otherwise it looks like the change should
be OK.

Regards,
Rune
Ronald S. Bultje March 5, 2017, 7:38 p.m. UTC | #13
Hi,

On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:

> On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
> > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> > > you may want to add yourself to MAINTAINERs (after talking with
> > > roberto, who i belive has less interrest in cinepak than you do
> > > nowadays)
> >
> > Sounds ok for me. Roberto, what do you think (if you read this)?
>
> The only address to him which I found (in an old commit) bounced,
> there was no reply here on the list either.
>
> Both can be a coincidence, but otherwise it looks like the change should
> be OK.


No. This has been discussed repeatedly. Stop trying to push this through.

Ronald
u-9iep@aetey.se March 5, 2017, 9:26 p.m. UTC | #14
Ronald,

On Sun, Mar 05, 2017 at 02:38:31PM -0500, Ronald S. Bultje wrote:
> Hi,
> 
> On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:
> 
> > On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
> > > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> > > > you may want to add yourself to MAINTAINERs (after talking with
> > > > roberto, who i belive has less interrest in cinepak than you do
> > > > nowadays)
> > >
> > > Sounds ok for me. Roberto, what do you think (if you read this)?
> >
> > The only address to him which I found (in an old commit) bounced,
> > there was no reply here on the list either.
> >
> > Both can be a coincidence, but otherwise it looks like the change should
> > be OK.
> 
> 
> No. This has been discussed repeatedly. Stop trying to push this through.

My maintainership (for the code which I have contributed to, you may be
unaware about this fact) was not discussed otherwise than cited here.

Please check what you are commenting,
especially when you mean to sound like having a definite power and being
quite rude.

What _has_ been discussed are the proposed Cinepak decoder improvements.

There has not been even a single substantial technical argument against
any version of the patch, nor any style/duplication argument against
the last two versions. (Did you read the discussion? Did you check the
validity of the presented arguments from all the involved parties, yours
truly included? You did not say a word after I addressed your concerns.)

This makes it even harder to put your statement into a proper context.

Regards,
Rune
Ronald S. Bultje March 5, 2017, 11:20 p.m. UTC | #15
Hi Rune,

On Sun, Mar 5, 2017 at 4:26 PM, <u-9iep@aetey.se> wrote:

> Ronald,
>
> On Sun, Mar 05, 2017 at 02:38:31PM -0500, Ronald S. Bultje wrote:
> > Hi,
> >
> > On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:
> >
> > > On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
> > > > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> > > > > you may want to add yourself to MAINTAINERs (after talking with
> > > > > roberto, who i belive has less interrest in cinepak than you do
> > > > > nowadays)
> > > >
> > > > Sounds ok for me. Roberto, what do you think (if you read this)?
> > >
> > > The only address to him which I found (in an old commit) bounced,
> > > there was no reply here on the list either.
> > >
> > > Both can be a coincidence, but otherwise it looks like the change
> should
> > > be OK.
> >
> >
> > No. This has been discussed repeatedly. Stop trying to push this through.
>
> My maintainership (for the code which I have contributed to, you may be
> unaware about this fact) was not discussed otherwise than cited here.
>
> Please check what you are commenting,
> especially when you mean to sound like having a definite power


The rule is that a patch cannot be committed while a developer has
outstanding comments. There's outstanding comments, including some from me.
You said "the change should be OK", and I'm simply saying "no" to that,
because it's not. The patch is not OK until review comments from other
developers have been addressed by the patch submitter.

Ronald
u-9iep@aetey.se March 6, 2017, 9:15 a.m. UTC | #16
On Sun, Mar 05, 2017 at 06:20:34PM -0500, Ronald S. Bultje wrote:
> Hi Rune,
> 
> On Sun, Mar 5, 2017 at 4:26 PM, <u-9iep@aetey.se> wrote:
> 
> > Ronald,
> >
> > On Sun, Mar 05, 2017 at 02:38:31PM -0500, Ronald S. Bultje wrote:
> > > Hi,
> > >
> > > On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:
> > >
> > > > On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
> > > > > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:
> > > > > > you may want to add yourself to MAINTAINERs (after talking with
> > > > > > roberto, who i belive has less interrest in cinepak than you do
> > > > > > nowadays)
> > > > >
> > > > > Sounds ok for me. Roberto, what do you think (if you read this)?
> > > >
> > > > The only address to him which I found (in an old commit) bounced,
> > > > there was no reply here on the list either.
> > > >
> > > > Both can be a coincidence, but otherwise it looks like the change
> > should
> > > > be OK.
> > >
> > >
> > > No. This has been discussed repeatedly. Stop trying to push this through.
> >
> > My maintainership (for the code which I have contributed to, you may be
> > unaware about this fact) was not discussed otherwise than cited here.
> >
> > Please check what you are commenting,
> > especially when you mean to sound like having a definite power
> 
> 
> The rule is that a patch cannot be committed while a developer has
> outstanding comments. There's outstanding comments, including some from me.
> You said "the change should be OK", and I'm simply saying "no" to that,
> because it's not. The patch is not OK until review comments from other
> developers have been addressed by the patch submitter.

Thanks for the explanation Ronald,

1. Apparently you did not aim at the maintainership question.

  It would be nice if you confirm this point,
  to avoid further misunderstandings.

2. Would you cite the outstanding comment or comments about the patches
   which you feel are valid and not addressed?

   I kindly ask you check the latest iteration of the patch series
   where I tried to summarize all discussion points in the containing
   letter.

TL;DR: I do have respect for your work and competence.
       Please do the same to others. Even if we all too often meet people
       who lack in those areas, there are some exceptions.

To be fair your comments concerning the patches were among the most
detailed and friendly, this is appreciated!

But even your well meant comments happened to miss the point, being based
in unfounded assumptions. I answered and explained and there it stopped.

Frankly the only outstanding comments which I am aware of are of the kind
"you the patch submitter do not understand why 'this' is better than
'that'".

The fact is that I _do_ understand why you believe that something is
better or not. I just do not feel a belief is sufficient per se.

I invited you and others to look at _what_ makes something better
or not and got literally no answers.

Besides of course
"imagine if someone else will do something else,
it would be terrible, thus you are leading us to hell" :)

or otherwise, citing literally:
"we know this code, we know it can do this, don't tell us it's not
possible" without specifying (and in fact misunderstanding) what
"this" we were talking about: making the speedup usable with an
unprepared application, which the overhelming majority of applications
are. Regrettably, the present code is not prepared nor meant to be able to.
The proposed code could, but this possiblity is now cut off,
just to avoid contention.

As a third example, your comment
"We [...] know it's unreasonable from a maintenance perspective".

The very presence of the cinepak decoder is a proof to the opposite
because it worked this way (generating two different pixel formats
inside the decoder) for years. Again, what "it's" we are talking about?
The same "it" in your mind as in the patch?

I went so long as to quantify/estimate the expected extra maintenance
burden while you did not even mention any tangible criteria.

This makes me doubt that you or others who commented have time to read
the answers or are prepared to expect competence of your counterparts.
Unfortunately this affects the quality of the judgement.

I do have respect for your work and competence.
Please do the same to others.

Regards,
Rune
Paul B Mahol March 6, 2017, 9:19 a.m. UTC | #17
On 3/6/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
> On Sun, Mar 05, 2017 at 06:20:34PM -0500, Ronald S. Bultje wrote:
>> Hi Rune,
>>
>> On Sun, Mar 5, 2017 at 4:26 PM, <u-9iep@aetey.se> wrote:
>>
>> > Ronald,
>> >
>> > On Sun, Mar 05, 2017 at 02:38:31PM -0500, Ronald S. Bultje wrote:
>> > > Hi,
>> > >
>> > > On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:
>> > >
>> > > > On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:
>> > > > > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer
>> > > > > wrote:
>> > > > > > you may want to add yourself to MAINTAINERs (after talking with
>> > > > > > roberto, who i belive has less interrest in cinepak than you do
>> > > > > > nowadays)
>> > > > >
>> > > > > Sounds ok for me. Roberto, what do you think (if you read this)?
>> > > >
>> > > > The only address to him which I found (in an old commit) bounced,
>> > > > there was no reply here on the list either.
>> > > >
>> > > > Both can be a coincidence, but otherwise it looks like the change
>> > should
>> > > > be OK.
>> > >
>> > >
>> > > No. This has been discussed repeatedly. Stop trying to push this
>> > > through.
>> >
>> > My maintainership (for the code which I have contributed to, you may be
>> > unaware about this fact) was not discussed otherwise than cited here.
>> >
>> > Please check what you are commenting,
>> > especially when you mean to sound like having a definite power
>>
>>
>> The rule is that a patch cannot be committed while a developer has
>> outstanding comments. There's outstanding comments, including some from
>> me.
>> You said "the change should be OK", and I'm simply saying "no" to that,
>> because it's not. The patch is not OK until review comments from other
>> developers have been addressed by the patch submitter.
>
> Thanks for the explanation Ronald,
>
> 1. Apparently you did not aim at the maintainership question.
>
>   It would be nice if you confirm this point,
>   to avoid further misunderstandings.
>
> 2. Would you cite the outstanding comment or comments about the patches
>    which you feel are valid and not addressed?
>
>    I kindly ask you check the latest iteration of the patch series
>    where I tried to summarize all discussion points in the containing
>    letter.
>
> TL;DR: I do have respect for your work and competence.
>        Please do the same to others. Even if we all too often meet people
>        who lack in those areas, there are some exceptions.
>
> To be fair your comments concerning the patches were among the most
> detailed and friendly, this is appreciated!
>
> But even your well meant comments happened to miss the point, being based
> in unfounded assumptions. I answered and explained and there it stopped.
>
> Frankly the only outstanding comments which I am aware of are of the kind
> "you the patch submitter do not understand why 'this' is better than
> 'that'".
>
> The fact is that I _do_ understand why you believe that something is
> better or not. I just do not feel a belief is sufficient per se.
>
> I invited you and others to look at _what_ makes something better
> or not and got literally no answers.
>
> Besides of course
> "imagine if someone else will do something else,
> it would be terrible, thus you are leading us to hell" :)
>
> or otherwise, citing literally:
> "we know this code, we know it can do this, don't tell us it's not
> possible" without specifying (and in fact misunderstanding) what
> "this" we were talking about: making the speedup usable with an
> unprepared application, which the overhelming majority of applications
> are. Regrettably, the present code is not prepared nor meant to be able to.
> The proposed code could, but this possiblity is now cut off,
> just to avoid contention.
>
> As a third example, your comment
> "We [...] know it's unreasonable from a maintenance perspective".
>
> The very presence of the cinepak decoder is a proof to the opposite
> because it worked this way (generating two different pixel formats
> inside the decoder) for years. Again, what "it's" we are talking about?
> The same "it" in your mind as in the patch?
>
> I went so long as to quantify/estimate the expected extra maintenance
> burden while you did not even mention any tangible criteria.
>
> This makes me doubt that you or others who commented have time to read
> the answers or are prepared to expect competence of your counterparts.
> Unfortunately this affects the quality of the judgement.
>
> I do have respect for your work and competence.
> Please do the same to others.

What is your Work and Competence in FFmpeg?
wm4 March 6, 2017, 9:31 a.m. UTC | #18
On Mon, 6 Mar 2017 10:15:44 +0100
u-9iep@aetey.se wrote:

> On Sun, Mar 05, 2017 at 06:20:34PM -0500, Ronald S. Bultje wrote:
> > Hi Rune,
> > 
> > On Sun, Mar 5, 2017 at 4:26 PM, <u-9iep@aetey.se> wrote:
> >   
> > > Ronald,
> > >
> > > On Sun, Mar 05, 2017 at 02:38:31PM -0500, Ronald S. Bultje wrote:  
> > > > Hi,
> > > >
> > > > On Sun, Mar 5, 2017 at 2:22 PM, <u-9iep@aetey.se> wrote:
> > > >  
> > > > > On Mon, Feb 13, 2017 at 02:41:40PM +0100, u-9iep@aetey.se wrote:  
> > > > > > On Mon, Feb 13, 2017 at 02:19:45PM +0100, Michael Niedermayer wrote:  
> > > > > > > you may want to add yourself to MAINTAINERs (after talking with
> > > > > > > roberto, who i belive has less interrest in cinepak than you do
> > > > > > > nowadays)  
> > > > > >
> > > > > > Sounds ok for me. Roberto, what do you think (if you read this)?  
> > > > >
> > > > > The only address to him which I found (in an old commit) bounced,
> > > > > there was no reply here on the list either.
> > > > >
> > > > > Both can be a coincidence, but otherwise it looks like the change  
> > > should  
> > > > > be OK.  
> > > >
> > > >
> > > > No. This has been discussed repeatedly. Stop trying to push this through.  
> > >
> > > My maintainership (for the code which I have contributed to, you may be
> > > unaware about this fact) was not discussed otherwise than cited here.
> > >
> > > Please check what you are commenting,
> > > especially when you mean to sound like having a definite power  
> > 
> > 
> > The rule is that a patch cannot be committed while a developer has
> > outstanding comments. There's outstanding comments, including some from me.
> > You said "the change should be OK", and I'm simply saying "no" to that,
> > because it's not. The patch is not OK until review comments from other
> > developers have been addressed by the patch submitter.  
> 
> Thanks for the explanation Ronald,
> 
> 1. Apparently you did not aim at the maintainership question.
> 
>   It would be nice if you confirm this point,
>   to avoid further misunderstandings.
> 
> 2. Would you cite the outstanding comment or comments about the patches
>    which you feel are valid and not addressed?
> 
>    I kindly ask you check the latest iteration of the patch series
>    where I tried to summarize all discussion points in the containing
>    letter.
> 
> TL;DR: I do have respect for your work and competence.
>        Please do the same to others. Even if we all too often meet people
>        who lack in those areas, there are some exceptions.
> 
> To be fair your comments concerning the patches were among the most
> detailed and friendly, this is appreciated!
> 
> But even your well meant comments happened to miss the point, being based
> in unfounded assumptions. I answered and explained and there it stopped.
> 
> Frankly the only outstanding comments which I am aware of are of the kind
> "you the patch submitter do not understand why 'this' is better than
> 'that'".
> 
> The fact is that I _do_ understand why you believe that something is
> better or not. I just do not feel a belief is sufficient per se.
> 
> I invited you and others to look at _what_ makes something better
> or not and got literally no answers.
> 
> Besides of course
> "imagine if someone else will do something else,
> it would be terrible, thus you are leading us to hell" :)
> 
> or otherwise, citing literally:
> "we know this code, we know it can do this, don't tell us it's not
> possible" without specifying (and in fact misunderstanding) what
> "this" we were talking about: making the speedup usable with an
> unprepared application, which the overhelming majority of applications
> are. Regrettably, the present code is not prepared nor meant to be able to.
> The proposed code could, but this possiblity is now cut off,
> just to avoid contention.
> 
> As a third example, your comment
> "We [...] know it's unreasonable from a maintenance perspective".
> 
> The very presence of the cinepak decoder is a proof to the opposite
> because it worked this way (generating two different pixel formats
> inside the decoder) for years. Again, what "it's" we are talking about?
> The same "it" in your mind as in the patch?
> 
> I went so long as to quantify/estimate the expected extra maintenance
> burden while you did not even mention any tangible criteria.
> 
> This makes me doubt that you or others who commented have time to read
> the answers or are prepared to expect competence of your counterparts.
> Unfortunately this affects the quality of the judgement.
> 
> I do have respect for your work and competence.
> Please do the same to others.

I think we've repeatedly made clear that:

- we don't really want the decoder to output multiple pixfmts by choice
- but if it's limited to a very small number of formats (say, 2) it
  might be ok
- but ideally the decoder should output the "native" pixfmt/colorspace
  of the codec, which apparently is YCgCo (which would also require
  libswscale modifications)
- we're not even interested in a faster cinepak decoder, because we see
  no need for it (even you haven't demonstrated any need for it - I'm
  pretty sure everyone would be all over a fast intermediate codec, but
  people don't use cinepak for that)
- trying to trick us into applying this patch by playing policy lawyer
  won't work
- whether or not having these multiple code paths would be so horrible
  is not even the main question - by now we're just annoyed by this
  topic and how much attention it seems to drain, so we'd rather ignore
  this (don't blame us - people tend to ignore unimportant things to
  get important work done, and not applying your patches seems to be
  the safer option)
- again, nobody understands your needs, and some we find extremely
  absurd and out of place, like using getenv() in a decoder (we'd
  probably suggest a better alternative if we did, maybe)
u-9iep@aetey.se March 6, 2017, 10:02 a.m. UTC | #19
On Mon, Mar 06, 2017 at 10:19:33AM +0100, Paul B Mahol wrote:
> On 3/6/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
> > I do have respect for your work and competence.
> > Please do the same to others.
> 
> What is your Work and Competence in FFmpeg?

This is offtopic and looks like ad hominem
"a logical fallacy in which an argument is rebutted by attacking the
character, motive, or other attribute of the person making the argument"
(Wikipedia)

It is more of a fallacy because you make it look like the "competence
in FFmpeg" (whatever you mean with this) is the only competence kind to
be respected or/and to be relevant while doing work for FFmpeg.

As for your curiosity, feel free to do the research.
The data is public.

Rune
Paul B Mahol March 6, 2017, 11:49 a.m. UTC | #20
On 3/6/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
> On Mon, Mar 06, 2017 at 10:19:33AM +0100, Paul B Mahol wrote:
>> On 3/6/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
>> > I do have respect for your work and competence.
>> > Please do the same to others.
>>
>> What is your Work and Competence in FFmpeg?
>
> This is offtopic and looks like ad hominem
> "a logical fallacy in which an argument is rebutted by attacking the
> character, motive, or other attribute of the person making the argument"
> (Wikipedia)
>
> It is more of a fallacy because you make it look like the "competence
> in FFmpeg" (whatever you mean with this) is the only competence kind to
> be respected or/and to be relevant while doing work for FFmpeg.
>
> As for your curiosity, feel free to do the research.
> The data is public.
>
> Rune

Your mail doesn't give me anything useful when googling.
u-9iep@aetey.se March 6, 2017, 12:08 p.m. UTC | #21
To everyone:

I am really sorry for having to react to this kind of irrational
arguments. OTOH keeping silence could be interpreted as accepting them.

As far as my common sense goes, I can not count these as "pending comments".

TL;DR:
 trying to reason,
 given the arbitrary statements and careless behaviour of the opponent

On Mon, Mar 06, 2017 at 10:31:42AM +0100, wm4 wrote:
> I think we've repeatedly made clear that:
> - we don't really want the decoder to output multiple pixfmts by choice

It is clear that you personally do not want the Cinepak decoder to be able
to output multiple pixel formats, for unspecified reasons ("by choice").

You make it look like it is your discretion who decides whether something
is deemed to be good or not for FFmpeg. I do not believe this.

> - but if it's limited to a very small number of formats (say, 2) it
>   might be ok

You are again referring to nothing but your discretion.

Anyway, if you take your time to look at the current version of the patch,
it _happens_ to do exactly this.

So this point is moot.

> - but ideally the decoder should output the "native" pixfmt/colorspace
>   of the codec, which apparently is YCgCo (which would also require
>   libswscale modifications)

What makes it "ideal", by which criteria? Be technical please.

(BTW "which apparently is" is a wrong assumption.
You still did not use the references I offered you for some background
information on the matter. Please learn, before posting next time.)

> - we're not even interested in a faster cinepak decoder, because we see
>   no need for it (even you haven't demonstrated any need for it - I'm
>   pretty sure everyone would be all over a fast intermediate codec, but
>   people don't use cinepak for that)

Are you saying no need for everyone?

Nor have they ever tasted a 3 times faster decoder but you already know
they will not like it?

Sorry, seriously I have no reason to assume that you have prerequisites
to know for everyone and especially in advance.

You are talking about what is inside the horizon of yourself.

> - trying to trick us into applying this patch by playing policy lawyer
>   won't work

You make it again sound like you decide for the project.
Frankly, such behaviour makes the project look bad.

Also you are accusing me of "tricking" and "playing".

Please be specific about what I am doing wrong
or apologize otherwise.

While at it, I am still expecting your apology
for the use of abusive language when you were commenting
on my proposal last time.

> - whether or not having these multiple code paths would be so horrible
>   is not even the main question - by now we're just annoyed by this
>   topic and how much attention it seems to drain, so we'd rather ignore
>   this (don't blame us - people tend to ignore unimportant things to
>   get important work done, and not applying your patches seems to be
>   the safer option)

You did not have to pay attention to the patch, given your limited
understanding of the matter (which _you_ stated repeatedly) and the
limited capacity to participate, which you demonstrated by relying on
wrong assumptions and neglecting to correct your mistakes in this discussion.

> - again, nobody understands your needs, and some we find extremely

You are assuming that everyone has the same level of understanding
as you do. This is not necessarily true.

>   absurd and out of place, like using getenv() in a decoder (we'd
>   probably suggest a better alternative if we did, maybe)

Regrettably, the proposed alternatives were either to put the
corresponding functionality into every application or to avoid ffmpeg
and start a separate project. Neither of those alternatives would be
practically useful. Never mind, this is not part of the current patch.

Thanks for reading this long!

Regards,
Rune
Ronald S. Bultje March 6, 2017, 1:19 p.m. UTC | #22
Hi,

On Mon, Mar 6, 2017 at 7:08 AM, <u-9iep@aetey.se> wrote:

> It is clear that you personally do not want the Cinepak decoder to be able
> to output multiple pixel formats, for unspecified reasons ("by choice").
>
> You make it look like it is your discretion who decides whether something
> is deemed to be good or not for FFmpeg. I do not believe this.
>

It is a general project design decision.

If you want to change that, fine, propose it and let's discuss that on its
own merit. But right now this is how the project works. You can't change
that without changing official project policy.

I'm fundamentally against it, and as such I'm against the patch. I
understand why you want it, but I don't think that's sufficient reason.


> You did not have to pay attention to the patch, given your limited
> understanding of the matter


And this is a CoC [1] violation, please don't do that again.

Ronald

[1] https://ffmpeg.org/developer.html#Code-of-conduct
u-9iep@aetey.se March 6, 2017, 2:52 p.m. UTC | #23
On Mon, Mar 06, 2017 at 08:19:06AM -0500, Ronald S. Bultje wrote:
> Hi,
> 
> On Mon, Mar 6, 2017 at 7:08 AM, <u-9iep@aetey.se> wrote:
> 
> > You did not have to pay attention to the patch, given your limited
> > understanding of the matter
> 
> 
> And this is a CoC [1] violation, please don't do that again.

Actually what I wrote is:

> > You did not have to pay attention to the patch, given your limited
> > understanding of the matter (which _you_ stated repeatedly) 

I am sorry for saying this.

Pointing to one's behaviour is correct, presenting it as one's inherent
quality is not. I did wrong.

Indeed that person did not state her/his limited understanding nor
did this repeatedly. What was repeated was erroneous technical
comments about which "native" pixel format Cinepak has.

The formulation about understanding I meant to refer to was that of wm4:

  > I don't know what you're thinking, but that's just wrong.

It looks bizarre to me to judge without knowing
but this was _not_ a confession of a "limited understanding of the matter".

I am sorry for misrepresentation of the statements of wm4!

Now Ronald where were you with your policing
when the person in question (wm4) repeatedly violated CoC?

I point out that your behaviour of selective punishment
is most probably not conscious but it is a fact.

Regards,
Rune
u-9iep@aetey.se March 6, 2017, 3:23 p.m. UTC | #24
On Mon, Mar 06, 2017 at 08:19:06AM -0500, Ronald S. Bultje wrote:
> On Mon, Mar 6, 2017 at 7:08 AM, <u-9iep@aetey.se> wrote:
> 
> > It is clear that you personally do not want the Cinepak decoder to be able
> > to output multiple pixel formats, for unspecified reasons ("by choice").
> >
> > You make it look like it is your discretion who decides whether something
> > is deemed to be good or not for FFmpeg. I do not believe this.

> It is a general project design decision.

Did the project (who?) ever make a general decision about Cinepak
or delegate to wm4 to represent the project's stance?

I question her/his tone, not the policy, but what is the latter
when not refracted through your, wm4's or someone else's interpretation?

The design decision you refer to I would love to read. What I have
seen on the ffmpeg.org did not say "decoders shall not produce multiple
pixel formats".

> If you want to change that, fine, propose it and let's discuss that on its

I am really interested to know what is "that" you refer to.

> own merit. But right now this is how the project works. You can't change
> that without changing official project policy.
> 
> I'm fundamentally against it, and as such I'm against the patch. I
> understand why you want it, but I don't think that's sufficient reason.

Thanks for stating your position as you do, constructively and coherently.

Regards,
Rune
wm4 March 6, 2017, 3:31 p.m. UTC | #25
On Mon, 6 Mar 2017 16:23:15 +0100
u-9iep@aetey.se wrote:

> Did the project (who?) ever make a general decision about Cinepak
> or delegate to wm4 to represent the project's stance?
> 
> I question her/his tone, not the policy, but what is the latter
> when not refracted through your, wm4's or someone else's interpretation?
> 
> The design decision you refer to I would love to read. What I have
> seen on the ffmpeg.org did not say "decoders shall not produce multiple
> pixel formats".

Before you get too obsessed with me, I have never represented anything,
or intended to represent. I merely contributed my own opinion about
these patches. I have not more powers than Paul Mahol or Ronald Bultje
(if anything, I have less powers than them).
Ronald S. Bultje March 6, 2017, 3:39 p.m. UTC | #26
Hi,

On Mon, Mar 6, 2017 at 9:52 AM, <u-9iep@aetey.se> wrote:

> I am sorry for misrepresentation of the statements of wm4!
>

No problem.

Now Ronald where were you with your policing
> when the person in question (wm4) repeatedly violated CoC?
>
> I point out that your behaviour of selective punishment
> is most probably not conscious but it is a fact.


I will deal with people privately on IRC where I can. But thank you for
pointing it out.

Ronald
u-9iep@aetey.se March 6, 2017, 4 p.m. UTC | #27
On Mon, Mar 06, 2017 at 04:31:32PM +0100, wm4 wrote:
> On Mon, 6 Mar 2017 16:23:15 +0100
> u-9iep@aetey.se wrote:
> 
> > Did the project (who?) ever make a general decision about Cinepak
> > or delegate to wm4 to represent the project's stance?
> > 
> > I question her/his tone, not the policy, but what is the latter
> > when not refracted through your, wm4's or someone else's interpretation?
> > 
> > The design decision you refer to I would love to read. What I have
> > seen on the ffmpeg.org did not say "decoders shall not produce multiple
> > pixel formats".
> 
> Before you get too obsessed with me, I have never represented anything,
> or intended to represent. I merely contributed my own opinion about
> these patches. I have not more powers than Paul Mahol or Ronald Bultje
> (if anything, I have less powers than them).

It is nice of you to say this. Thanks for the clarification.

Regards,
Rune
diff mbox

Patch

diff --git a/libavcodec/cinepak.c b/libavcodec/cinepak.c
index d657e9c0c1..7b08e20e06 100644
--- a/libavcodec/cinepak.c
+++ b/libavcodec/cinepak.c
@@ -31,6 +31,8 @@ 
  *
  * Cinepak colorspace support (c) 2013 Rl, Aetey Global Technologies AB
  * @author Cinepak colorspace, Rl, Aetey Global Technologies AB
+ * Extra output formats / optimizations (c) 2017 Rl, Aetey Global Technologies AB
+ * @author Extra output formats / optimizations, Rl, Aetey Global Technologies AB
  */
 
 #include <stdio.h>
@@ -39,23 +41,48 @@ 
 
 #include "libavutil/common.h"
 #include "libavutil/intreadwrite.h"
+#include "libavutil/opt.h"
+/* #include "libavutil/avassert.h" */
 #include "avcodec.h"
 #include "internal.h"
 
+/* rounding to nearest; truncation would be slightly faster
+ * but it noticeably affects the picture quality;
+ * unless we become extremely desperate to use every single cycle
+ * we do not bother implementing a choice -- rl */
+#define PACK_RGB_RGB565(r,g,b) (((av_clip_uint8((r)+4)>>3)<<11)|((av_clip_uint8((g)+2)>>2)<<5)|(av_clip_uint8((b)+4)>>3))
 
-typedef uint8_t cvid_codebook[12];
+/*
+ * more "desperate/ultimate" optimization possibilites:
+ * - possibly (hardly?) spare a cycle or two by not ensuring to stay
+ *   inside the frame at vector decoding (the frame is allocated with
+ *   a margin for us as an extra precaution, we can as well use this)
+ * - skip filling in opacity when it is not needed by the data consumer,
+ *   in many cases rgb32 is almost as fast as rgb565, with full quality,
+ *   improving its speed can make sense
+ */
+
+typedef union cvid_codebook {
+    uint32_t  rgb32[256][ 4];
+    uint8_t   rgb24[256][12];
+    uint16_t rgb565[256][ 4];
+    uint8_t  yuv420[256][ 6];
+    uint8_t    pal8[256][ 4];
+} cvid_codebook;
 
-#define MAX_STRIPS      32
+#define MAX_STRIPS      32    /* an arbitrary limit -- rl */
 
 typedef struct cvid_strip {
     uint16_t          id;
     uint16_t          x1, y1;
     uint16_t          x2, y2;
-    cvid_codebook     v4_codebook[256];
-    cvid_codebook     v1_codebook[256];
+    cvid_codebook     v4_codebook;
+    cvid_codebook     v1_codebook;
 } cvid_strip;
 
-typedef struct CinepakContext {
+typedef struct CinepakContext CinepakContext;
+struct CinepakContext {
+    const AVClass *class;
 
     AVCodecContext *avctx;
     AVFrame *frame;
@@ -71,57 +98,192 @@  typedef struct CinepakContext {
     int sega_film_skip_bytes;
 
     uint32_t pal[256];
-} CinepakContext;
 
-static void cinepak_decode_codebook (cvid_codebook *codebook,
-                                     int chunk_id, int size, const uint8_t *data)
-{
-    const uint8_t *eod = (data + size);
-    uint32_t flag, mask;
-    int      i, n;
-    uint8_t *p;
-
-    /* check if this chunk contains 4- or 6-element vectors */
-    n    = (chunk_id & 0x04) ? 4 : 6;
-    flag = 0;
-    mask = 0;
-
-    p = codebook[0];
-    for (i=0; i < 256; i++) {
-        if ((chunk_id & 0x01) && !(mask >>= 1)) {
-            if ((data + 4) > eod)
-                break;
-
-            flag  = AV_RB32 (data);
-            data += 4;
-            mask  = 0x80000000;
-        }
+    void (*decode_codebook)(CinepakContext *s,
+                            cvid_codebook *codebook, int chunk_id,
+                            int size, const uint8_t *data);
+    int  (*decode_vectors)(CinepakContext *s, cvid_strip *strip,
+                           int chunk_id, int size, const uint8_t *data);
+/* options */
+    enum AVPixelFormat out_pixfmt;
+};
 
-        if (!(chunk_id & 0x01) || (flag & mask)) {
-            int k, kk;
+#define OFFSET(x) offsetof(CinepakContext, x)
+#define VD AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_DECODING_PARAM
+static const AVOption options[] = {
+{"output_pixel_format", "set output pixel format: rgb24/rgb32/rgb565/yuv420p/pal8; yuv420p is approximate", OFFSET(out_pixfmt), AV_OPT_TYPE_PIXEL_FMT, {.i64=AV_PIX_FMT_NONE}, -1, INT_MAX, VD },
+    { NULL },
+};
 
-            if ((data + n) > eod)
-                break;
+static const AVClass cinepak_class = {
+    .class_name = "cinepak decoder",
+    .item_name  = av_default_item_name,
+    .option     = options,
+    .version    = LIBAVUTIL_VERSION_INT,
+};
 
-            for (k = 0; k < 4; ++k) {
-                int r = *data++;
-                for (kk = 0; kk < 3; ++kk)
-                    *p++ = r;
+/* this is an attempt to satisfy the requirement to reduce code duplication
+ * feel free to do this in a more elegant fashion, but keep the speed
+ * -- rl */
+#define CODEBOOK_HEAD \
+    const uint8_t *eod;\
+    uint32_t flag, mask;\
+    int      i, n;\
+    int palette_video;\
+    int selective_update;\
+
+#define CODEBOOK_STREAM_PARSING \
+    for (i=0; i < 256; i++) {\
+        if (selective_update && !(mask >>= 1)) {\
+            if ((data + 4) > eod)\
+                break;\
+\
+            flag  = AV_RB32 (data);\
+            data += 4;\
+            mask  = 0x80000000;\
+        }\
+\
+        if (!selective_update || (flag & mask)) {\
+            int k;\
+\
+            if ((data + n) > eod)\
+                break;\
+
+#define CODEBOOK_INTRO \
+    selective_update = (chunk_id & 0x01);\
+    eod = (data + size);\
+    flag = 0;\
+    mask = 0;\
+
+#define CODEBOOK_FULL_COLOR \
+    /* check if this chunk contains 4- or 6-element vectors */\
+    n    = (chunk_id & 0x04) ? 4 : 6;\
+    palette_video = s->palette_video;\
+    CODEBOOK_INTRO\
+    CODEBOOK_STREAM_PARSING\
+
+#define VECTOR_INTRO \
+    CODEBOOK_INTRO\
+    v1_only          = (chunk_id & 0x02);\
+\
+    for (y=strip->y1; y < strip->y2; y+=4) {\
+
+#define VECTOR_STREAM_PARSING \
+        for (x=strip->x1; x < strip->x2; x+=4) {\
+            if (selective_update && !(mask >>= 1)) {\
+                if ((data + 4) > eod)\
+                    return AVERROR_INVALIDDATA;\
+\
+                flag  = AV_RB32 (data);\
+                data += 4;\
+                mask  = 0x80000000;\
+            }\
+\
+            if (!selective_update || (flag & mask)) {\
+                if (!v1_only && !(mask >>= 1)) {\
+                    if ((data + 4) > eod)\
+                        return AVERROR_INVALIDDATA;\
+\
+                    flag  = AV_RB32 (data);\
+                    data += 4;\
+                    mask  = 0x80000000;\
+                }\
+\
+                if (v1_only || (~flag & mask)) {\
+                    POINTER_TYPE *p;\
+                    if (data >= eod)\
+                        return AVERROR_INVALIDDATA;\
+
+#define VECTOR_DO \
+/* take care of y dimension not being multiple of 4, such streams exist */\
+        if(s->avctx->height - y > 1) {\
+            ip1 = ip0 + s->frame->linesize[0];\
+            if(s->avctx->height - y > 2) {\
+                ip2 = ip1 + s->frame->linesize[0];\
+                if(s->avctx->height - y > 3) {\
+                    ip3 = ip2 + s->frame->linesize[0];\
+                }\
+            }\
+        }\
+/* to get the correct picture for not-multiple-of-4 cases let us fill each\
+ * block from the bottom up, thus possibly overwriting the bottommost line\
+ * more than once but ending with the correct data in place\
+ * (instead of in-loop checking) */\
+        VECTOR_STREAM_PARSING\
+
+static void cinepak_decode_codebook_rgb32 (CinepakContext *s,
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data)
+{
+    CODEBOOK_HEAD
+    uint32_t *p = codebook->rgb32[0];
+
+    CODEBOOK_FULL_COLOR
+
+            if (n == 4)
+                if (palette_video)
+                    for (k = 0; k < 4; ++k)
+                        *p++ = s->pal[*data++]; /* this is easy */
+                else
+                    for (k = 0; k < 4; ++k) {
+                        int r = *data++;
+/* in some situations we might not have to set opacity */
+                        *p++ = /**/ (255<<24)| /**/ (r<<16)|(r<<8)|r;
+                    }
+            else { /* n == 6 */
+                int y, u, v;
+                u = (int8_t)data[4];
+                v = (int8_t)data[5];
+                for(k=0; k<4; ++k) {
+                    y = *data++;
+/* in some situations we might not have to set opacity */
+                    *p++ = /**/ (255<<24)| /**/
+/* here the cinepak color space excels */
+                           (av_clip_uint8(y + v*2)<<16)|
+                           (av_clip_uint8(y - (u/2) - v)<<8)|
+                            av_clip_uint8(y + u*2);
+                }
+                data += 2;
             }
-            if (n == 6) {
-                int r, g, b, u, v;
-                u = *(int8_t *)data++;
-                v = *(int8_t *)data++;
-                p -= 12;
+        } else {
+            p += 4;
+        }
+    }
+}
+
+static void cinepak_decode_codebook_rgb24 (CinepakContext *s,
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data)
+{
+    CODEBOOK_HEAD
+    uint8_t *p = codebook->rgb24[0];
+
+    CODEBOOK_FULL_COLOR
+
+            if (n == 4)
+                if (palette_video)
+                    for (k = 0; k < 4; ++k) {
+                        uint32_t r = s->pal[*data++];
+                        *p++ = (r>>16)&0xff;
+                        *p++ = (r>>8) &0xff;
+                        *p++ =  r     &0xff;
+                    }
+                else
+                    for (k = 0; k < 4; ++k) {
+                        int kk, r = *data++;
+                        for (kk = 0; kk < 3; ++kk)
+                            *p++ = r;
+                    }
+            else { /* n == 6 */
+                int y, u, v;
+                u = (int8_t)data[4];
+                v = (int8_t)data[5];
                 for(k=0; k<4; ++k) {
-                    r = *p++ + v*2;
-                    g = *p++ - (u/2) - v;
-                    b = *p   + u*2;
-                    p -= 2;
-                    *p++ = av_clip_uint8(r);
-                    *p++ = av_clip_uint8(g);
-                    *p++ = av_clip_uint8(b);
+                    y = *data++;
+/* here the cinepak color space excels */
+                    *p++ = av_clip_uint8(y + v*2);
+                    *p++ = av_clip_uint8(y - (u/2) - v);
+                    *p++ = av_clip_uint8(y + u*2);
                 }
+                data += 2;
             }
         } else {
             p += 12;
@@ -129,134 +291,448 @@  static void cinepak_decode_codebook (cvid_codebook *codebook,
     }
 }
 
-static int cinepak_decode_vectors (CinepakContext *s, cvid_strip *strip,
+static void cinepak_decode_codebook_rgb565 (CinepakContext *s,
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data)
+{
+    CODEBOOK_HEAD
+    uint16_t *p = codebook->rgb565[0];
+
+    CODEBOOK_FULL_COLOR
+
+            if (n == 4)
+                if (palette_video)
+                    for (k = 0; k < 4; ++k) {
+                        uint32_t r = s->pal[*data++];
+                        *p++ = PACK_RGB_RGB565((r>>16)&0xff,
+                                               (r>>8)&0xff,
+                                                r&0xff);
+                    }
+                else
+                    for (k = 0; k < 4; ++k) {
+                        int r = *data++;
+                        *p++ = PACK_RGB_RGB565(r,r,r);
+                    }
+            else { /* n == 6 */
+                int y, u, v;
+                u = (int8_t)data[4];
+                v = (int8_t)data[5];
+                for(k=0; k<4; ++k) {
+                    y = *data++;
+/* here the cinepak color space excels */
+                    *p++ = PACK_RGB_RGB565(y + v*2,
+                                           y - (u/2) - v,
+                                           y + u*2);
+                }
+                data += 2;
+            }
+        } else {
+            p += 4;
+        }
+    }
+}
+
+/* a simplistic version to begin with, it is also fast -- rl */
+static void cinepak_decode_codebook_yuv420 (CinepakContext *s,
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data)
+{
+    CODEBOOK_HEAD
+    uint8_t *p = codebook->yuv420[0];
+
+    CODEBOOK_FULL_COLOR
+
+            if (n == 4)
+                if (palette_video) {
+/* here we have kind of "more" data than the output format can express */
+                    int r, g, b, u = 0, v = 0;
+                    for (k = 0; k < 4; ++k) {
+                        uint32_t rr = s->pal[*data++];
+                        r = (rr>>16)&0xff;
+                        g = (rr>>8) &0xff;
+                        b =  rr     &0xff;
+/* calculate the components (https://en.wikipedia.org/wiki/YUV) */
+                        *p++ = ((r*66+g*129+b*25+128)>>8)+16;
+                        u += (-r*38-g*74+b*112+128)>>8;
+                        v += (r*112-g*94-b*18+128)>>8;
+                    }
+                    *p++ = (u+2)/4+128;
+                    *p++ = (v+2)/4+128;
+                } else { /* grayscale, easy */
+                    for (k = 0; k < 4; ++k) {
+                        *p++ = *data++;
+                    }
+                    *p++ = 128;
+                    *p++ = 128;
+                }
+            else { /* n == 6 */
+/* here we'd have to handle double format conversion
+ * Cinepak=>rgb24 and then rgb24=>yuv420p, which can not be shortcut;
+ * for the moment just copying as-is, for simplicity and speed,
+ * color will be slightly off but not much */
+                *p++ = *data++;
+                *p++ = *data++;
+                *p++ = *data++;
+                *p++ = *data++;
+                *p++ = *data++ + 128;
+                *p++ = *data++ + 128;
+            }
+        } else {
+            p += 6;
+        }
+    }
+}
+
+/* here we do not expect anything besides palettized video,
+ * nor check the data for validity, which should be ok
+ * as long as we do not write beyond the bounds */
+static void cinepak_decode_codebook_pal8 (CinepakContext *s,
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data)
+{
+    const uint8_t *eod;
+    uint32_t flag, mask;
+    int      i;
+    int selective_update;
+    uint8_t *p = codebook->pal8[0];
+
+#define PAL8_VECTOR_LENGTH 4
+#define n PAL8_VECTOR_LENGTH
+/*    av_assert0(chunk_id & 0x04); */
+
+    CODEBOOK_INTRO
+    CODEBOOK_STREAM_PARSING
+
+#undef n
+
+            for (k = 0; k < 4; ++k)
+                *p++ = *data++;
+        } else {
+            p += 4;
+        }
+    }
+}
+
+static int cinepak_decode_vectors_rgb32 (CinepakContext *s, cvid_strip *strip,
+                                   int chunk_id, int size, const uint8_t *data)
+{
+    const uint8_t   *eod;
+    uint32_t         flag, mask;
+    uint32_t         *cb0, *cb1, *cb2, *cb3;
+    int              x, y;
+    char            *ip0, *ip1, *ip2, *ip3;
+    int selective_update;
+    int v1_only;
+
+    VECTOR_INTRO
+
+        ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
+                                strip->x1*4 + y*s->frame->linesize[0];
+#define POINTER_TYPE uint32_t
+    VECTOR_DO
+#undef POINTER_TYPE
+
+                    p = strip->v1_codebook.rgb32[*data++] + 2; /* ... + 8 */
+                    memcpy(ip3 + 0, p, 4); memcpy(ip3 + 4, p, 4);
+                    memcpy(ip2 + 0, p, 4); memcpy(ip2 + 4, p, 4);
+                    p += 1; /* ... + 12 */
+                    memcpy(ip3 + 8, p, 4); memcpy(ip3 + 12, p, 4);
+                    memcpy(ip2 + 8, p, 4); memcpy(ip2 + 12, p, 4);
+                    p -= 3; /* ... + 0 */
+                    memcpy(ip1 + 0, p, 4); memcpy(ip1 + 4, p, 4);
+                    memcpy(ip0 + 0, p, 4); memcpy(ip0 + 4, p, 4);
+                    p += 1; /* ... + 4 */
+                    memcpy(ip1 + 8, p, 4); memcpy(ip1 + 12, p, 4);
+                    memcpy(ip0 + 8, p, 4); memcpy(ip0 + 12, p, 4);
+
+                } else if (flag & mask) {
+                    if ((data + 4) > eod)
+                        return AVERROR_INVALIDDATA;
+
+                    cb0 = strip->v4_codebook.rgb32[*data++];
+                    cb1 = strip->v4_codebook.rgb32[*data++];
+                    cb2 = strip->v4_codebook.rgb32[*data++];
+                    cb3 = strip->v4_codebook.rgb32[*data++];
+                    memcpy(ip3 + 0, cb2 + 2, 8);
+                    memcpy(ip3 + 8, cb3 + 2, 8);
+                    memcpy(ip2 + 0, cb2 + 0, 8);
+                    memcpy(ip2 + 8, cb3 + 0, 8);
+                    memcpy(ip1 + 0, cb0 + 2, 8);
+                    memcpy(ip1 + 8, cb1 + 2, 8);
+                    memcpy(ip0 + 0, cb0 + 0, 8);
+                    memcpy(ip0 + 8, cb1 + 0, 8);
+
+                }
+            }
+
+            ip0 += 16;  ip1 += 16;
+            ip2 += 16;  ip3 += 16;
+        }
+    }
+
+    return 0;
+}
+
+static int cinepak_decode_vectors_rgb24 (CinepakContext *s, cvid_strip *strip,
                                    int chunk_id, int size, const uint8_t *data)
 {
-    const uint8_t   *eod = (data + size);
+    const uint8_t   *eod;
     uint32_t         flag, mask;
     uint8_t         *cb0, *cb1, *cb2, *cb3;
-    int             x, y;
+    int              x, y;
     char            *ip0, *ip1, *ip2, *ip3;
+    int selective_update;
+    int v1_only;
 
-    flag = 0;
-    mask = 0;
+    VECTOR_INTRO
 
-    for (y=strip->y1; y < strip->y2; y+=4) {
+        ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
+                                strip->x1*3 + y*s->frame->linesize[0];
+
+#define POINTER_TYPE uint8_t
+    VECTOR_DO
+#undef POINTER_TYPE
+
+                    p = strip->v1_codebook.rgb24[*data++] + 6;
+                    memcpy(ip3 + 0, p, 3); memcpy(ip3 + 3, p, 3);
+                    memcpy(ip2 + 0, p, 3); memcpy(ip2 + 3, p, 3);
+                    p += 3; /* ... + 9 */
+                    memcpy(ip3 + 6, p, 3); memcpy(ip3 + 9, p, 3);
+                    memcpy(ip2 + 6, p, 3); memcpy(ip2 + 9, p, 3);
+                    p -= 9; /* ... + 0 */
+                    memcpy(ip1 + 0, p, 3); memcpy(ip1 + 3, p, 3);
+                    memcpy(ip0 + 0, p, 3); memcpy(ip0 + 3, p, 3);
+                    p += 3; /* ... + 3 */
+                    memcpy(ip1 + 6, p, 3); memcpy(ip1 + 9, p, 3);
+                    memcpy(ip0 + 6, p, 3); memcpy(ip0 + 9, p, 3);
+
+                } else if (flag & mask) {
+                    if ((data + 4) > eod)
+                        return AVERROR_INVALIDDATA;
+
+                    cb0 = strip->v4_codebook.rgb24[*data++];
+                    cb1 = strip->v4_codebook.rgb24[*data++];
+                    cb2 = strip->v4_codebook.rgb24[*data++];
+                    cb3 = strip->v4_codebook.rgb24[*data++];
+                    memcpy(ip3 + 0, cb2 + 6, 6);
+                    memcpy(ip3 + 6, cb3 + 6, 6);
+                    memcpy(ip2 + 0, cb2 + 0, 6);
+                    memcpy(ip2 + 6, cb3 + 0, 6);
+                    memcpy(ip1 + 0, cb0 + 6, 6);
+                    memcpy(ip1 + 6, cb1 + 6, 6);
+                    memcpy(ip0 + 0, cb0 + 0, 6);
+                    memcpy(ip0 + 6, cb1 + 0, 6);
+
+                }
+            }
+
+            ip0 += 12;  ip1 += 12;
+            ip2 += 12;  ip3 += 12;
+        }
+    }
+
+    return 0;
+}
+
+static int cinepak_decode_vectors_rgb565 (CinepakContext *s, cvid_strip *strip,
+                                   int chunk_id, int size, const uint8_t *data)
+{
+    const uint8_t   *eod;
+    uint32_t         flag, mask;
+    uint16_t        *cb0, *cb1, *cb2, *cb3;
+    int              x, y;
+    char            *ip0, *ip1, *ip2, *ip3;
+    int selective_update;
+    int v1_only;
+
+    VECTOR_INTRO
+
+        ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
+                                strip->x1*2 + y*s->frame->linesize[0];
+
+#define POINTER_TYPE uint16_t
+    VECTOR_DO
+#undef POINTER_TYPE
+
+                    p = strip->v1_codebook.rgb565[*data++];
+                    * (uint16_t *)ip3    = *((uint16_t *)ip3+1) =
+                    * (uint16_t *)ip2    = *((uint16_t *)ip2+1) = p[2];
+                    *((uint16_t *)ip3+2) = *((uint16_t *)ip3+3) =
+                    *((uint16_t *)ip2+2) = *((uint16_t *)ip2+3) = p[3];
+                    * (uint16_t *)ip1    = *((uint16_t *)ip1+1) =
+                    * (uint16_t *)ip0    = *((uint16_t *)ip0+1) = p[0];
+                    *((uint16_t *)ip1+2) = *((uint16_t *)ip1+3) =
+                    *((uint16_t *)ip0+2) = *((uint16_t *)ip0+3) = p[1];
+
+                } else if (flag & mask) {
+                    if ((data + 4) > eod)
+                        return AVERROR_INVALIDDATA;
+
+                    cb0 = strip->v4_codebook.rgb565[*data++];
+                    cb1 = strip->v4_codebook.rgb565[*data++];
+                    cb2 = strip->v4_codebook.rgb565[*data++];
+                    cb3 = strip->v4_codebook.rgb565[*data++];
+                    memcpy(ip3 + 0, cb2 + 2, 4);
+                    memcpy(ip3 + 4, cb3 + 2, 4);
+                    memcpy(ip2 + 0, cb2 + 0, 4);
+                    memcpy(ip2 + 4, cb3 + 0, 4);
+                    memcpy(ip1 + 0, cb0 + 2, 4);
+                    memcpy(ip1 + 4, cb1 + 2, 4);
+                    memcpy(ip0 + 0, cb0 + 0, 4);
+                    memcpy(ip0 + 4, cb1 + 0, 4);
+
+                }
+            }
+
+            ip0 += 8;  ip1 += 8;
+            ip2 += 8;  ip3 += 8;
+        }
+    }
+
+    return 0;
+}
+
+static int cinepak_decode_vectors_yuv420 (CinepakContext *s, cvid_strip *strip,
+                                   int chunk_id, int size, const uint8_t *data)
+{
+    const uint8_t   *eod;
+    uint32_t         flag, mask;
+    uint8_t         *cb0, *cb1, *cb2, *cb3;
+    int              x, y;
+    char            *ip0, *ip1, *ip2, *ip3,
+                    *up01, *up23, *vp01, *vp23;
+    int selective_update;
+    int v1_only;
+
+    VECTOR_INTRO
 
-/* take care of y dimension not being multiple of 4, such streams exist */
         ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
-          (s->palette_video?strip->x1:strip->x1*3) + (y * s->frame->linesize[0]);
+                                strip->x1*3 + y*s->frame->linesize[0];
+        up01 = up23 = s->frame->data[1] + strip->x1 + y/2*s->frame->linesize[1];
+        vp01 = vp23 = s->frame->data[2] + strip->x1 + y/2*s->frame->linesize[2];
         if(s->avctx->height - y > 1) {
             ip1 = ip0 + s->frame->linesize[0];
             if(s->avctx->height - y > 2) {
                 ip2 = ip1 + s->frame->linesize[0];
+                up23 = up01 + s->frame->linesize[1];
+                vp23 = vp01 + s->frame->linesize[2];
                 if(s->avctx->height - y > 3) {
                     ip3 = ip2 + s->frame->linesize[0];
                 }
             }
         }
+
 /* to get the correct picture for not-multiple-of-4 cases let us fill each
  * block from the bottom up, thus possibly overwriting the bottommost line
  * more than once but ending with the correct data in place
  * (instead of in-loop checking) */
 
-        for (x=strip->x1; x < strip->x2; x+=4) {
-            if ((chunk_id & 0x01) && !(mask >>= 1)) {
-                if ((data + 4) > eod)
-                    return AVERROR_INVALIDDATA;
+#define POINTER_TYPE uint8_t
+        VECTOR_STREAM_PARSING
+#undef POINTER_TYPE
 
-                flag  = AV_RB32 (data);
-                data += 4;
-                mask  = 0x80000000;
-            }
+                    p = strip->v1_codebook.yuv420[*data++];
+                    ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[2];
+                    ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[3];
+                    ip1[0] = ip1[1] = ip0[0] = ip0[1] = p[0];
+                    ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[1];
+                    p += 4;
+                    up01[0] = up01[1] = up23[0] = up23[1] = *p++;
+                    vp01[0] = vp01[1] = vp23[0] = vp23[1] = *p++;
 
-            if (!(chunk_id & 0x01) || (flag & mask)) {
-                if (!(chunk_id & 0x02) && !(mask >>= 1)) {
+                } else if (flag & mask) {
                     if ((data + 4) > eod)
                         return AVERROR_INVALIDDATA;
 
-                    flag  = AV_RB32 (data);
-                    data += 4;
-                    mask  = 0x80000000;
+                    cb0 = strip->v4_codebook.yuv420[*data++];
+                    cb1 = strip->v4_codebook.yuv420[*data++];
+                    cb2 = strip->v4_codebook.yuv420[*data++];
+                    cb3 = strip->v4_codebook.yuv420[*data++];
+                    memcpy(ip3 + 0, cb2 + 2, 2);
+                    memcpy(ip3 + 2, cb3 + 2, 2);
+                    memcpy(ip2 + 0, cb2 + 0, 2);
+                    memcpy(ip2 + 2, cb3 + 0, 2);
+                    memcpy(ip1 + 0, cb0 + 2, 2);
+                    memcpy(ip1 + 2, cb1 + 2, 2);
+                    memcpy(ip0 + 0, cb0 + 0, 2);
+                    memcpy(ip0 + 2, cb1 + 0, 2);
+                    cb0 += 4; cb1 += 4; cb2 += 4; cb3 += 4;
+                    up23[0] = *cb2++; vp23[0] = *cb2;
+                    up23[1] = *cb3++; vp23[1] = *cb3;
+                    up01[0] = *cb0++; vp01[0] = *cb0;
+                    up01[1] = *cb1++; vp01[1] = *cb1;
+
                 }
+            }
 
-                if ((chunk_id & 0x02) || (~flag & mask)) {
-                    uint8_t *p;
-                    if (data >= eod)
-                        return AVERROR_INVALIDDATA;
+            ip0 += 4;  ip1 += 4;
+            ip2 += 4;  ip3 += 4;
+            up01 += 2; up23 += 2;
+            vp01 += 2; vp23 += 2;
+        }
+    }
 
-                    p = strip->v1_codebook[*data++];
-                    if (s->palette_video) {
-                        ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[6];
-                        ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[9];
-                        ip1[0] = ip1[1] = ip0[0] = ip0[1] = p[0];
-                        ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[3];
-                    } else {
-                        p += 6;
-                        memcpy(ip3 + 0, p, 3); memcpy(ip3 + 3, p, 3);
-                        memcpy(ip2 + 0, p, 3); memcpy(ip2 + 3, p, 3);
-                        p += 3; /* ... + 9 */
-                        memcpy(ip3 + 6, p, 3); memcpy(ip3 + 9, p, 3);
-                        memcpy(ip2 + 6, p, 3); memcpy(ip2 + 9, p, 3);
-                        p -= 9; /* ... + 0 */
-                        memcpy(ip1 + 0, p, 3); memcpy(ip1 + 3, p, 3);
-                        memcpy(ip0 + 0, p, 3); memcpy(ip0 + 3, p, 3);
-                        p += 3; /* ... + 3 */
-                        memcpy(ip1 + 6, p, 3); memcpy(ip1 + 9, p, 3);
-                        memcpy(ip0 + 6, p, 3); memcpy(ip0 + 9, p, 3);
-                    }
+    return 0;
+}
+
+static int cinepak_decode_vectors_pal8 (CinepakContext *s, cvid_strip *strip,
+                                 int chunk_id, int size, const uint8_t *data)
+{
+    const uint8_t   *eod;
+    uint32_t         flag, mask;
+    uint8_t         *cb0, *cb1, *cb2, *cb3;
+    int              x, y;
+    char            *ip0, *ip1, *ip2, *ip3;
+    int selective_update;
+    int v1_only;
+
+    VECTOR_INTRO
+
+        ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
+                                strip->x1 + y*s->frame->linesize[0];
+
+#define POINTER_TYPE uint8_t
+    VECTOR_DO
+#undef POINTER_TYPE
+
+                    p = strip->v1_codebook.pal8[*data++];
+                    ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[2];
+                    ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[3];
+                    ip1[0] = ip1[1] = ip0[0] = ip0[1] = p[0];
+                    ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[1];
 
                 } else if (flag & mask) {
+                    uint8_t *p;
                     if ((data + 4) > eod)
                         return AVERROR_INVALIDDATA;
 
-                    cb0 = strip->v4_codebook[*data++];
-                    cb1 = strip->v4_codebook[*data++];
-                    cb2 = strip->v4_codebook[*data++];
-                    cb3 = strip->v4_codebook[*data++];
-                    if (s->palette_video) {
-                        uint8_t *p;
-                        p = ip3;
-                        *p++ = cb2[6];
-                        *p++ = cb2[9];
-                        *p++ = cb3[6];
-                        *p   = cb3[9];
-                        p = ip2;
-                        *p++ = cb2[0];
-                        *p++ = cb2[3];
-                        *p++ = cb3[0];
-                        *p   = cb3[3];
-                        p = ip1;
-                        *p++ = cb0[6];
-                        *p++ = cb0[9];
-                        *p++ = cb1[6];
-                        *p   = cb1[9];
-                        p = ip0;
-                        *p++ = cb0[0];
-                        *p++ = cb0[3];
-                        *p++ = cb1[0];
-                        *p   = cb1[3];
-                    } else {
-                        memcpy(ip3 + 0, cb2 + 6, 6);
-                        memcpy(ip3 + 6, cb3 + 6, 6);
-                        memcpy(ip2 + 0, cb2 + 0, 6);
-                        memcpy(ip2 + 6, cb3 + 0, 6);
-                        memcpy(ip1 + 0, cb0 + 6, 6);
-                        memcpy(ip1 + 6, cb1 + 6, 6);
-                        memcpy(ip0 + 0, cb0 + 0, 6);
-                        memcpy(ip0 + 6, cb1 + 0, 6);
-                    }
+                    cb0 = strip->v4_codebook.pal8[*data++];
+                    cb1 = strip->v4_codebook.pal8[*data++];
+                    cb2 = strip->v4_codebook.pal8[*data++];
+                    cb3 = strip->v4_codebook.pal8[*data++];
+                    p = ip3;
+                    *p++ = cb2[2];
+                    *p++ = cb2[3];
+                    *p++ = cb3[2];
+                    *p   = cb3[3];
+                    p = ip2;
+                    *p++ = cb2[0];
+                    *p++ = cb2[1];
+                    *p++ = cb3[0];
+                    *p   = cb3[1];
+                    p = ip1;
+                    *p++ = cb0[2];
+                    *p++ = cb0[3];
+                    *p++ = cb1[2];
+                    *p   = cb1[3];
+                    p = ip0;
+                    *p++ = cb0[0];
+                    *p++ = cb0[1];
+                    *p++ = cb1[0];
+                    *p   = cb1[1];
 
                 }
             }
 
-            if (s->palette_video) {
-                ip0 += 4;  ip1 += 4;
-                ip2 += 4;  ip3 += 4;
-            } else {
-                ip0 += 12;  ip1 += 12;
-                ip2 += 12;  ip3 += 12;
-            }
+            ip0 += 4;  ip1 += 4;
+            ip2 += 4;  ip3 += 4;
         }
     }
 
@@ -290,22 +766,22 @@  static int cinepak_decode_strip (CinepakContext *s,
         case 0x21:
         case 0x24:
         case 0x25:
-            cinepak_decode_codebook (strip->v4_codebook, chunk_id,
-                chunk_size, data);
+            s->decode_codebook(s, &strip->v4_codebook,
+                chunk_id, chunk_size, data);
             break;
 
         case 0x22:
         case 0x23:
         case 0x26:
         case 0x27:
-            cinepak_decode_codebook (strip->v1_codebook, chunk_id,
-                chunk_size, data);
+            s->decode_codebook (s, &strip->v1_codebook,
+                chunk_id, chunk_size, data);
             break;
 
         case 0x30:
         case 0x31:
         case 0x32:
-            return cinepak_decode_vectors (s, strip, chunk_id,
+            return s->decode_vectors (s, strip, chunk_id,
                 chunk_size, data);
         }
 
@@ -385,9 +861,9 @@  static int cinepak_decode (CinepakContext *s)
         strip_size = ((s->data + strip_size) > eod) ? (eod - s->data) : strip_size;
 
         if ((i > 0) && !(frame_flags & 0x01)) {
-            memcpy (s->strips[i].v4_codebook, s->strips[i-1].v4_codebook,
+            memcpy (&s->strips[i].v4_codebook, &s->strips[i-1].v4_codebook,
                 sizeof(s->strips[i].v4_codebook));
-            memcpy (s->strips[i].v1_codebook, s->strips[i-1].v1_codebook,
+            memcpy (&s->strips[i].v1_codebook, &s->strips[i-1].v1_codebook,
                 sizeof(s->strips[i].v1_codebook));
         }
 
@@ -402,23 +878,85 @@  static int cinepak_decode (CinepakContext *s)
     return 0;
 }
 
+/* given a palettized input */
+static const enum AVPixelFormat pixfmt_list[] = {
+    AV_PIX_FMT_RGB24,
+    AV_PIX_FMT_RGB32,
+    AV_PIX_FMT_RGB565,
+    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_PAL8, /* only when input is palettized */
+    AV_PIX_FMT_NONE
+};
+
+/* given a non-palettized input */
+static const enum AVPixelFormat pixfmt_list_2[] = {
+    AV_PIX_FMT_RGB24,
+    AV_PIX_FMT_RGB32,
+    AV_PIX_FMT_RGB565,
+    AV_PIX_FMT_YUV420P,
+    AV_PIX_FMT_NONE
+};
+
 static av_cold int cinepak_decode_init(AVCodecContext *avctx)
 {
     CinepakContext *s = avctx->priv_data;
 
+/* we take advantage of VQ to efficiently support
+ * multiple output formats */
+
     s->avctx = avctx;
     s->width = (avctx->width + 3) & ~3;
     s->height = (avctx->height + 3) & ~3;
 
     s->sega_film_skip_bytes = -1;  /* uninitialized state */
 
-    // check for paletted data
-    if (avctx->bits_per_coded_sample != 8) {
-        s->palette_video = 0;
-        avctx->pix_fmt = AV_PIX_FMT_RGB24;
-    } else {
-        s->palette_video = 1;
-        avctx->pix_fmt = AV_PIX_FMT_PAL8;
+    /* check for paletted data */
+    s->palette_video = (avctx->bits_per_coded_sample == 8);
+
+/* If you are in a crisis, needing to influence the format choice
+ * in the decoder to workaround dumb/misbehaving applications,
+ * here would be the place to insert checking some dedicated
+ * environment variable -- rl
+ * NOTE that ffmpeg API policies as of 2017 STRONGLY DISCOURAGE
+ * taking such freedoms, do not bother the developers with this,
+ * even as an example such code is not welcome */
+
+    if (s->out_pixfmt != AV_PIX_FMT_NONE) /* the option is set to something */
+        avctx->pix_fmt = s->out_pixfmt;
+    else
+        if (s->palette_video)
+            avctx->pix_fmt = ff_get_format(avctx, pixfmt_list);
+        else
+            avctx->pix_fmt = ff_get_format(avctx, pixfmt_list_2);
+
+    switch (avctx->pix_fmt) {
+    case AV_PIX_FMT_RGB32:
+        s->decode_codebook = cinepak_decode_codebook_rgb32;
+        s->decode_vectors  = cinepak_decode_vectors_rgb32;
+        break;
+    case AV_PIX_FMT_RGB24:
+        s->decode_codebook = cinepak_decode_codebook_rgb24;
+        s->decode_vectors  = cinepak_decode_vectors_rgb24;
+        break;
+    case AV_PIX_FMT_RGB565:
+        s->decode_codebook = cinepak_decode_codebook_rgb565;
+        s->decode_vectors  = cinepak_decode_vectors_rgb565;
+        break;
+    case AV_PIX_FMT_YUV420P:
+        s->decode_codebook = cinepak_decode_codebook_yuv420;
+        s->decode_vectors  = cinepak_decode_vectors_yuv420;
+        break;
+    case AV_PIX_FMT_PAL8:
+        if (!s->palette_video) {
+            av_log(avctx, AV_LOG_ERROR, "Palettized output not supported without palettized input\n");
+            return AVERROR(EINVAL);
+        }
+        s->decode_codebook = cinepak_decode_codebook_pal8;
+        s->decode_vectors  = cinepak_decode_vectors_pal8;
+        break;
+    default:
+        av_log(avctx, AV_LOG_ERROR, "Unsupported pixel format %d\n", avctx->pix_fmt);
+        return AVERROR(EINVAL);
     }
 
     s->frame = av_frame_alloc();
@@ -457,7 +995,7 @@  static int cinepak_decode_frame(AVCodecContext *avctx,
         av_log(avctx, AV_LOG_ERROR, "cinepak_decode failed\n");
     }
 
-    if (s->palette_video)
+    if (avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy (s->frame->data[1], s->pal, AVPALETTE_SIZE);
 
     if ((ret = av_frame_ref(data, s->frame)) < 0)
@@ -488,4 +1026,6 @@  AVCodec ff_cinepak_decoder = {
     .close          = cinepak_decode_end,
     .decode         = cinepak_decode_frame,
     .capabilities   = AV_CODEC_CAP_DR1,
+    .pix_fmts       = pixfmt_list,
+    .priv_class     = &cinepak_class,
 };