diff mbox

[FFmpeg-devel,1/2] (was Re: deduplicated [PATCH] Cinepak: speed up decoding several-fold...) formats.

Message ID 20170305182630.GL32749@example.net
State Superseded
Headers show

Commit Message

u-9iep@aetey.se March 5, 2017, 6:26 p.m. UTC
For one week there was no substantial feedback on the patches.

I understand that the first patch was very large which makes it hard
to review.

That's why I now have produced a limited, minimalistic change which still
yields most of the improvement.

I kindly ask the reader to note that cinepak is not ffmpeg's everyday meal
i.e. fast shortcut judgements may be not applicable. Please take your time.

To avoid contention around some of the issues which triggered such
regrettable kind of judgements let me point out that

- letting a decoder produce multiple pixel formats is explicitly
  allowed and supported by the documented API (get_format())

- the patch does not introduce a new feature in this respect,
  just makes its presence more visible and much more useful

- the cinepak codec is extraordinary well suited to produce virtually
  any desired pixel format very efficiently (iow it is quite special)

- even though we have got a framework of libswscale, there is no point
  to use it where it can not help; it would be several times slower than
  cinepak itself, NOTE not by a coincidence but by the very design of both

- when some functionality is desired from libswscale, it remains fully
  usable with cinepak, with no loss in the actual efficiency of libswscale

- the 3-fold practical speedup makes the difference between working/useful
  and unusable/useless, with very little complexity (aka maintenance costs)

- the feedback concerning the coding style and code duplication was not
  specified in any measurable terms but it seems that all related issues
  have been resolved

Looking forward to your analysis and feedback.

Thanks.

Rune
From cbe0664a13a615ecac302ffb0de577cd08b1f910 Mon Sep 17 00:00:00 2001
From: Rl <addr-see-the-website@aetey.se>
Date: Sun, 5 Mar 2017 15:54:06 +0100
Subject: [PATCH 1/2] Cinepak decoding: refactor for several-fold speedup

Refactored codebook generation and vector parsing
to better support the decoder API and allow for optimizations.

Decoding to rgb24 is now slightly faster.

Replaced generation of the constrained pal8 output
(which was only possible with palettized input)
with rgb565, comparably fast but working with any input format.

Decoding to rgb565 is now several-fold faster:
 (including overheads, underestimation of the actual decoding speedup)
 --------
 matrixbench_mpeg2.mpg (720x567) encoded with ffmpeg into Cinepak
 using default settings, decoding on an i5 3570K, 3.4 GHz:
 ...
 fast_bilinear:              ~65x realtime
 patch w/rgb565 override:    ~154x realtime
 --------
 https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/206799.html

The default output format is unchanged, rgb24.
---
 libavcodec/cinepak.c | 462 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 252 insertions(+), 210 deletions(-)

Comments

Paul B Mahol March 5, 2017, 6:32 p.m. UTC | #1
On 3/5/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
> For one week there was no substantial feedback on the patches.
>
> I understand that the first patch was very large which makes it hard
> to review.
>
> That's why I now have produced a limited, minimalistic change which still
> yields most of the improvement.
>
> I kindly ask the reader to note that cinepak is not ffmpeg's everyday meal
> i.e. fast shortcut judgements may be not applicable. Please take your time.
>
> To avoid contention around some of the issues which triggered such
> regrettable kind of judgements let me point out that
>
> - letting a decoder produce multiple pixel formats is explicitly
>   allowed and supported by the documented API (get_format())
>
> - the patch does not introduce a new feature in this respect,
>   just makes its presence more visible and much more useful
>
> - the cinepak codec is extraordinary well suited to produce virtually
>   any desired pixel format very efficiently (iow it is quite special)

Cinepak is old crappy codec. Get over it.
u-9iep@aetey.se March 5, 2017, 6:58 p.m. UTC | #2
On Sun, Mar 05, 2017 at 07:32:08PM +0100, Paul B Mahol wrote:
> On 3/5/17, u-9iep@aetey.se <u-9iep@aetey.se> wrote:
> > I kindly ask the reader to note that cinepak is not ffmpeg's everyday meal
> > i.e. fast shortcut judgements may be not applicable. Please take your time.

> Cinepak is old crappy codec. Get over it.

Paul, please be technical.

For your information, what you say is an emotional expression
which looks patronizing and unfriendly.

To everyone: sorry for offtopic (hopefully a somewhat useful one)

Regards,
Rune
Karl Blomster March 6, 2017, 11:47 a.m. UTC | #3
Hi,
It is regrettable that the positions are so entrenched here, but I agree with 
the majority opinion that pushing this really isn't a good idea, nor do I 
understand the reasons it's even desirable in the first place. The core issue 
here, I think, is fundamentally different views of software architectural issues 
as well as an almost as fundamentally different understanding of what 
"maintainability" means. I will make an honest attempt at laying out the 
situation _as I see it_ and while I don't think it will actually convince 
anyone, perhaps it will clarify a few points, make the situation slightly 
clearer and/or prevent further misunderstandings. I am not a ffmpeg developer 
and do not speak for anyone else and definitely not for the project, but I've 
been following the project for a long time and I believe I have come to at least 
some understanding of its principles.
To start off with an obvious fact, ffmpeg is a huge project, and a relatively 
old one. It has quite a bit of legacy code that, while it works, does things in 
ways that are not viewed with favor today, nor fit in with the way the current 
developers think. My general impression of the project in the last few years is 
that the consensus has been to attempt to steer the project to align with the 
principle of least astonishment: less "clever" solutions, more separation of 
concerns, more consistency in code and in behavior. This patch is almost 
directly contrary to that principle, and I think that's where one of the major 
rubs are. That's not to say solutions that are seen as "ugly" cannot be 
accepted, but it has to be weighed carefully against the estimated usefulness of 
the change.
In this particular case, the cinepak decoder is already "doing it wrong", so to 
speak (of course it's not actually bugged, but hopefully you see what I mean) - 
it's outputting a colorspace that isn't its "natural" output colorspace. Adding 
more options may seem reasonable since it's already doing it, but it's in the 
opposite direction of where the project is heading. I think the original patch 
made this seem particularly unpalatable since it added a whole new "quick and 
dirty" colorspace conversion to YUV with hardcoded constants (there are several 
different YUV variants with slightly different constants), with a 
intended-as-helpful link to the Wikipedia page on YUV in an explanatory comment. 
Considering what project the patch was contributed to, I really don't think that 
link was appreciated at all.
With all of this in mind, there would need to be an extraordinarily big 
practical benefit for a large number of users for this patch to be percieved as 
a "necessary evil" (and just to be perfectly clear here, I'm not trying to say 
that there is any evil intent anywhere - it's a figure of speech). I'm sure that 
for _you_ there is such a benefit, but I don't think anyone else has even fully 
understood your use case. From your earlier correspondence, my impression 
gathered from several different hints in different emails is that you're trying 
to play back video on some extraordinarily slow system with no GPU, where the 
only decoder you could find that was fast enough (not just in terms of CPU 
cycles but also in terms of memory bandwidth) was cinepak with 16-bit RGB 
output. Furthermore, you cannot modify the playback software, necessitating the 
rather remarkable hack where a ubiquitous system library modifies its behavior 
based on an environment variable. You've done away with this modification now, 
but I think having that in there at the start kinda poisoned the well from the 
beginning. Regardless, while I don't doubt that you have a legitimate use case 
that is important to you, I really cannot see the scenario you seem to be 
painting as anything other than an extremely specific edge case. ffmpeg is a 
generalist library that tries to support a broad range of use cases on a wide 
range of platforms, but that also by necessity means it lends itself less well 
to extremely specific use cases like this one, where you're prepared to buy 
decoding performance at any cost.

In an earlier email you pointed out that historically, very few code changes 
have been needed to keep the cinepak decoder up to date with the rest of ffmpeg, 
highlighting a view of what maintainability is that I believe is substantially 
different from that of other participants in this debate. However, what it does 
imply is that the maintenance burden of having your own branch that suits your 
use case, with all the oddball solutions you could imagine, would not seem to be 
overly heavy.
Also, in another of your earlier emails you also rather passive-aggressively 
mentioned that not accepting the patch might be a "problem for the project". I 
really don't think anyone around here sees it that way, precisely because 
interest in cinepak in 2017 is not, shall we say, at an all time high.

Best regards
Karl Blomster
u-9iep@aetey.se March 6, 2017, 2:12 p.m. UTC | #4
Karl,

I wish to thank you for the effort to make the "opposite" parties
to better understand each other.

As you say, indeed the patch makes a move which looks in line with
ffmpeg's former traditions but which at least formally collides with
the currently percepted "best practices".

On Mon, Mar 06, 2017 at 12:47:26PM +0100, Karl Blomster wrote:
> think the original patch made this seem particularly unpalatable since it
> added a whole new "quick and dirty" colorspace conversion to YUV with
> hardcoded constants (there are several different YUV variants with slightly
> different constants), with a intended-as-helpful link to the Wikipedia page

I am pretty sure the definition of yuv420p in ffmpeg and the constants
agree to each other. Yes I am aware of the mess around the variety of
YUVs, that's why I referred to the source of the numbers. (IIRC they
come actually from Microsoft but I felt Wikipedia looked more neutral)

> I really don't think that link was appreciated at all.

Interesting that nobody but you could articulate what was wrong with
it. Anyway, this detail is now a matter of the past. I never needed
it in practice myself (hardware scalers via yuv420p were slower than
Cinepak-on-the-CPU), implemented it just for generality.

Undesired - removed.

> With all of this in mind, there would need to be an extraordinarily big
> practical benefit for a large number of users for this patch to be percieved
> as a "necessary evil" (and just to be perfectly clear here, I'm not trying

My problem with the commenters is that they take their percepted best
practice rules as something which is not to be exposed to scrutiny.

An attempt to say that the rules are not necessary applicable or useful
is not heard at all or otherwise amounts to heresy.

> correspondence, my impression gathered from several different hints in
> different emails is that you're trying to play back video on some
> extraordinarily slow system with no GPU, where the only decoder you could

Correct, no GPU. (GPU costs among others a lot more lines of code to
take care of than Cinepak patches :)

> cannot modify the playback software, necessitating the rather remarkable
> hack where a ubiquitous system library modifies its behavior based on an

Do you believe that libraries are not supposed to react on environment
variables? The most ubiquitous system one (libc) does by design.

Ffmpeg itself contains over twenty getenv() in the libav* libraries and
also via ffplay depends on SDL which for very natural reasons (similar
to my motivation with Cinepak) relies on environment variables.
Nor is there any apparent reasonable way to "get rid" of those getenv()s.

> ffmpeg
> is a generalist library that tries to support a broad range of use cases on
> a wide range of platforms, but that also by necessity means it lends itself
> less well to extremely specific use cases like this one, where you're
> prepared to buy decoding performance at any cost.

The cost in the code is in fact not that "big", about 68 LOC per output
pixel format, in the latest revision. Compared to the several times (!)
speed loss when going through swscale it looks to me like an
exceptionally low hanging fruit.

> In an earlier email you pointed out that historically, very few code changes
> have been needed to keep the cinepak decoder up to date with the rest of
> ffmpeg, highlighting a view of what maintainability is that I believe is
> substantially different from that of other participants in this debate.

Nobody cared to define their view on what maintainability means
even when I explicitly asked how to measure it.

> However, what it does imply is that the maintenance burden of having your
> own branch that suits your use case, with all the oddball solutions you
> could imagine, would not seem to be overly heavy.

That's what I was doing and apparently will have to, given that the
vitally useful things like out-of-band control are percepted as "evil" here.

It still feels inadequate to keep this functionality for myself, or
otherwise to maintain the patches so that they would suit someone else
(I made this now for the submission but would not do continuously).

> Also, in another of your earlier emails you also rather passive-aggressively
> mentioned that not accepting the patch might be a "problem for the project".
> I really don't think anyone around here sees it that way, precisely because
> interest in cinepak in 2017 is not, shall we say, at an all time high.

Indeed it is not the speed of Cinepak, nor the speed of any module
which is a problem, but I would say the atmosphere in the project.

> Best regards
> Karl Blomster

Thanks again.

Rune
Karl Blomster March 6, 2017, 4:42 p.m. UTC | #5
On 2017-03-06 15:12, u-9iep@aetey.se wrote:
> Karl,
>
> I wish to thank you for the effort to make the "opposite" parties
> to better understand each other.
>
> As you say, indeed the patch makes a move which looks in line with
> ffmpeg's former traditions but which at least formally collides with
> the currently percepted "best practices".
>
> On Mon, Mar 06, 2017 at 12:47:26PM +0100, Karl Blomster wrote:
>> think the original patch made this seem particularly unpalatable since it
>> added a whole new "quick and dirty" colorspace conversion to YUV with
>> hardcoded constants (there are several different YUV variants with slightly
>> different constants), with a intended-as-helpful link to the Wikipedia page
> I am pretty sure the definition of yuv420p in ffmpeg and the constants
> agree to each other. Yes I am aware of the mess around the variety of
> YUVs, that's why I referred to the source of the numbers. (IIRC they
> come actually from Microsoft but I felt Wikipedia looked more neutral)
I'm honestly not sure what you mean here. The most common flavor of YUV (or more 
strictly speaking YCbCr), which is the one that you seem to have taken the 
numeric approximation of color primaries from - informally "Rec. 601", from its 
ITU recommendation number - has its roots in analog color television and I 
believe it was standardized back in the early 1980's, long before digital video 
was practically usable in consumer applications. ffmpeg naturally supports 
several other well established variants as well, such as ITU rec. 709 (first 
standardized in 1990 I believe, originally for use in likewise analog HDTV) and 
ITU rec. 2020 (a recent invention intended for 4k and other such things). In 
practice someone decoding from cinepak's storage format (which is similar to 
ycgco in spirit but not the same as conventional ycgco, as far as I can tell) to 
yuv is probably unlikely to want anything other than rec. 601 primaries, but you 
can't know that. Standards conversion of digital video can be very complex if 
you want to do it right, and as CPU's have gotten faster there has been more 
emphasis in general in digital video on correctness and less on just getting it 
to work, which is why standards conversion these days tends to be left to 
specialized libraries. I understand that in your case you simply want it done 
fast and don't really worry too much about correctness, and while that's fine 
and a legitimate use case (if unusual, in this day and age), it may not be what 
other users want, and more importantly it's a principle of least astonishment 
violation.

Regardless, it is now gone so there's no use in dwelling on it further - just 
trying to clarify what's going on, here.

> Do you believe that libraries are not supposed to react on environment
> variables? The most ubiquitous system one (libc) does by design.
>
> Ffmpeg itself contains over twenty getenv() in the libav* libraries and
> also via ffplay depends on SDL which for very natural reasons (similar
> to my motivation with Cinepak) relies on environment variables.
> Nor is there any apparent reasonable way to "get rid" of those getenv()s.
I would say that as usual when determining what is "right" in software 
architecture, the answer is "it depends". As far as I am aware, most of the 
things libav* touches environment variables for is related to interactions with 
things external to ffmpeg, such as the network (HTTP_PROXY), the terminal, the 
locations of certain types of plugins, etc. Modifying internal behavior based on 
environment variables is ugly in my opinion, and I can't think of any other case 
in ffmpeg where this is done. In general, if you want to interact with library 
functionality, do it through the API. In this case there's even a mechanism in 
place explicitly for this purpose (negotiating output pixelformat via the 
get_format callback), although I'll note that the ordinary use case for that is 
quite different from what you want to do here.

As far as SDL goes, I'll just note that the documentation states that "[u]sing 
these variables isn't recommended and the names and presence of these variables 
aren't guaranteed from one release to the next." They seem intended for 
debugging only.

>> ffmpeg
>> is a generalist library that tries to support a broad range of use cases on
>> a wide range of platforms, but that also by necessity means it lends itself
>> less well to extremely specific use cases like this one, where you're
>> prepared to buy decoding performance at any cost.
> The cost in the code is in fact not that "big", about 68 LOC per output
> pixel format, in the latest revision. Compared to the several times (!)
> speed loss when going through swscale it looks to me like an
> exceptionally low hanging fruit.
And here's one of the biggest issues in this debate and where I think you and 
wm4 et al keep talking past each other. The cost in lines of code is relatively 
unimportant. The cost in "number of different behaviors" is different. The Unix 
way is to do one thing and do it well. Doing standards conversion (or as in the 
most recent patch, bitdepth scaling) directly in the decoder is fast, sure! 
Nobody disputes that. The problem is that when you have as many decoders as 
ffmpeg does, you *really* want them to behave in the same way as far as 
reasonably possible, and do one logical thing in one place. In practice this 
isn't a huge deal for cinepak in particular since it already does this bit of 
ugliness, but surely you can see why there is a reluctance to add more of it?

Then there's also the fact that, to be blunt to the point of rudeness, nobody 
*cares* about cinepak. I'm sure that *you* do, and as you've repeatedly pointed 
out we do not have your point of view, but from the "mainstream" perspective or 
whatever you want to call it this is definitely a fringe codec that has very few 
users today. Hence the reluctance to accept an "ugly" patch, I believe. A 
cursory search of the mailing list archives makes it seem like nobody's really 
contributed meaningful functionality to the decoder other than you yourself 
(although your tendency to change email addresses makes it a bit hard to tell 
for certain) and almost no users have ever asked about it.

One also has to wonder *how* slow a system needs to be, exactly, to require this 
kind of change. Again, I don't have all the details on your use case, but just 
out of curiosity, how many systems were you intending to deploy this to, correct 
to an order of magnitude? Forgive me for presuming things, but users who have 
not until now had video playback on their early 1990's vintage systems (dumb 
terminals?) for performance reasons must surely have gotten used to the 
situation by now. Or implemented their own decoder. Or...?

Best regards
Karl Blomster
u-9iep@aetey.se March 6, 2017, 8:19 p.m. UTC | #6
Hello Karl,

I appreciate your interest and comments.

Keeping on topic, to the patch decision makers:

 I actually complied with all real suggestions heard during
 the discussion, even with those I find being misdirected,
 i.e. virtually everything except dropping the patch as a whole.

 Now I am unsure, will anybody look at the patch
 or is it so that some policy (which everyone here but me knows)
 does clearly specify that such a patch shall be rejected no matter what?

Possibly the dialogue below will be ok ontopic, as a background.

On Mon, Mar 06, 2017 at 05:42:31PM +0100, Karl Blomster wrote:
> >Nor is there any apparent reasonable way to "get rid" of those getenv()s.
> I would say that as usual when determining what is "right" in software
> architecture, the answer is "it depends". As far as I am aware, most of the

Thank you. I fully agree.

> things libav* touches environment variables for is related to interactions
> with things external to ffmpeg, such as the network (HTTP_PROXY), the
> terminal, the locations of certain types of plugins, etc. Modifying internal
> behavior based on environment variables is ugly in my opinion, and I can't
> think of any other case in ffmpeg where this is done. In general, if you
> want to interact with library functionality, do it through the API. In this
> case there's even a mechanism in place explicitly for this purpose
> (negotiating output pixelformat via the get_format callback), although I'll
> note that the ordinary use case for that is quite different from what you
> want to do here.

Exactly, the possibility to choose the pixel format is present but it
is not meant to be controlled otherwise than via a competent application.

I agree that an added envvar is by nature an intrusive change,
that's why I try to reduce the tention by omitting this
(I have to patch my own ffmpeg builds in any case).

> As far as SDL goes, I'll just note that the documentation states that
> "[u]sing these variables isn't recommended and the names and presence of
> these variables aren't guaranteed from one release to the next." They seem
> intended for debugging only.

I guess they _were_ thought for debugging while one did not realize
that some things are impossible or hard to put under the applications'
responsibility. They have since then in practice become an integral part
of the product. My opinion corresponds to the fact that the variables
always are present in the usual non-debug SDL builds, do you have a
different experience? Of course I can not talk for the usage od SDL
everywhere, my use of the SDL-bound applications would be hardly
meaningful without the variables.

> >The cost in the code is in fact not that "big", about 68 LOC per output
> >pixel format, in the latest revision. Compared to the several times (!)
> >speed loss when going through swscale it looks to me like an
> >exceptionally low hanging fruit.
> And here's one of the biggest issues in this debate and where I think you
> and wm4 et al keep talking past each other. The cost in lines of code is
> relatively unimportant. The cost in "number of different behaviors" is
> different.

Ah. I see your point. Each extra degree of freedom makes it harder
to support all the possible cases and combinations.

> In practice this isn't a huge deal for cinepak in particular
> since it already does this bit of ugliness, but surely you can see why there
> is a reluctance to add more of it?

Thanks. Pity no one else did say this. Fortunately the last version
of the patch definitely does not make the situation worse, rather
quite more regular - pal8 was handled as a special case (a very rare one,
which makes maintenance even harder, also one which made the output depend
on the format of input), rgb565 is fast, fully regular and always available
(and also well defined what it shall look like, no uncertainty like yuv).

0rgb or argb are full quality and faster than rgb24 on usual hardware
but replacing the default rgb24 format would introduce a surprise.

> Then there's also the fact that, to be blunt to the point of rudeness,
> nobody *cares* about cinepak. I'm sure that *you* do, and as you've
> repeatedly pointed out we do not have your point of view, but from the
> "mainstream" perspective or whatever you want to call it this is definitely
> a fringe codec that has very few users today. Hence the reluctance to accept

I agree, it is not so well visible as the newest heavy codecs. It is not
profitable to the CPU/GPU manufacturers nor suits the contents providers,
they _choose_ to copy every byte again and again as many times as it is
being played. That's why the compression ratio becomes so important,
because the user not the provider pays for the heavy decoding while the
provider can not avoid paying for the (accumulated) traffic, nor accepts
to optimize the transmission by decoupling the user from the service
(say, like bittorrent).

The applications using cinepak do still exist but it is seldom advertised.

OTOH mainstream is not the whole world.
It is possible to even make one's living by doing unusual things,
even though it is hard to every time have to explain the sense present
in your choices.

I appreciated ffmpeg since long ago because it was created to be efficient
(with Fast as the first word). Now it is the hardware which takes over
instead, for the good and for the bad.

> an "ugly" patch, I believe. A cursory search of the mailing list archives
> makes it seem like nobody's really contributed meaningful functionality to
> the decoder other than you yourself (although your tendency to change email

This is not entirely true, other people were involved, but thank you!

> addresses makes it a bit hard to tell for certain) and almost no users have
> ever asked about it.

I was not good at looking at the Cinepak-related issues, my fault. Found
out recently some unattended ones like the closed as WONTFIX
"make the encoder faster". (Now it would be uncontroversial to implement the
formerly patented Cinepak encoder algo. It is not as good as our elbg but an
order of magnitude faster at encoding.)

Also Cinepak is not marketed at all. There are certainly people and
situations which would benefit from it but they do not know this.
Thus, hardly anybody asks.

> One also has to wonder *how* slow a system needs to be, exactly, to require
> this kind of change. Again, I don't have all the details on your use case,
> but just out of curiosity, how many systems were you intending to deploy
> this to, correct to an order of magnitude? Forgive me for presuming things,

Feel free to assume anything from 1 to the whole IoT.
It is neither of those of course. The main point is to make video
playback extremely lightweight so that you can use hardware one or two
classes lower than whoever takes a "modern" codec for the same purpose.
This is about not needing a GPU and using the memory bandwidth in the
best way. If you have got an energy-efficient display technology,
you spare quite a lot in the needed power (a slow low-gate-count CPU,
no GPU), this is a nice thing too.

This is no magic bullet, you pay quite a lot when you are
compressing, but every time you play back you offset the cost.

> but users who have not until now had video playback on their early 1990's
> vintage systems (dumb terminals?) for performance reasons must surely have
> gotten used to the situation by now. Or implemented their own decoder.
> Or...?

Do you mean possibly the "8088 corruption"? :) It is a very impressive
and nice illustration of what is possible when you have got suitable
algorithms and are ready to take unusual steps.

Regards,
Rune
u-9iep@aetey.se March 7, 2017, 10:19 a.m. UTC | #7
Now it has been more than a month since the initial submission
of the Cinepak decoder speedup patch (not counting the proof of concept
posted two months ago).

Since then, more and more of the oftentimes ignorant discussion was
dedicated to the perceived design/development policy conformance (still
did not see any reference to a corresponding document :) with no outcome.

Given the apparent lack of the interest from the FFmpeg project in making
improvements to the lightweight fast codecs (the class Cinepak belongs to)
and my lack of appreciation for the conversational style on the list
I do not feel motivated to contribute to the project any longer.

My thanks for FFmpeg and for the enormous work on making it as useful
as it is go to Michael Niedermayer, who should be also credited for his
much appreciated support and work to allow the integration into FFmpeg
and the past improvements of Cinepak encoder and decoder.
(To be clear: the friendly support does not imply endorsement)

Thanks Michael.

Now I end my involvement with FFmpeg, good luck with the project.

If a casual reader interested in Cinepak would try the patches,
I recommend the series
  https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/207603.html
  https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/207597.html
including
  https://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/207625.html

Regards,
Rune
diff mbox

Patch

diff --git a/libavcodec/cinepak.c b/libavcodec/cinepak.c
index d657e9c0c1..d8bc7860eb 100644
--- a/libavcodec/cinepak.c
+++ b/libavcodec/cinepak.c
@@ -29,8 +29,8 @@ 
  * @see For more information on the quirky data inside Sega FILM/CPK files, visit:
  *   http://wiki.multimedia.cx/index.php?title=Sega_FILM
  *
- * Cinepak colorspace support (c) 2013 Rl, Aetey Global Technologies AB
- * @author Cinepak colorspace, Rl, Aetey Global Technologies AB
+ * Cinepak colorspace/optimizations (c) 2013-2017 Rl, Aetey Global Technologies AB
+ * @author Cinepak colorspace/optimizations, Rl, Aetey Global Technologies AB
  */
 
 #include <stdio.h>
@@ -39,24 +39,29 @@ 
 
 #include "libavutil/common.h"
 #include "libavutil/intreadwrite.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
 #include "avcodec.h"
 #include "internal.h"
 
+#define MAX_STRIPS      32    /* an arbitrary limit -- rl */
 
-typedef uint8_t cvid_codebook[12];
-
-#define MAX_STRIPS      32
+typedef union cvid_codebook {
+    uint8_t    rgb24[256][12];
+    uint16_t  rgb565[256][ 4];
+} cvid_codebook;
 
 typedef struct cvid_strip {
     uint16_t          id;
     uint16_t          x1, y1;
     uint16_t          x2, y2;
-    cvid_codebook     v4_codebook[256];
-    cvid_codebook     v1_codebook[256];
+    cvid_codebook     v4_codebook;
+    cvid_codebook     v1_codebook;
 } cvid_strip;
 
-typedef struct CinepakContext {
-
+typedef struct CinepakContext CinepakContext;
+struct CinepakContext {
+    const AVClass *class;
     AVCodecContext *avctx;
     AVFrame *frame;
 
@@ -71,195 +76,233 @@  typedef struct CinepakContext {
     int sega_film_skip_bytes;
 
     uint32_t pal[256];
-} CinepakContext;
-
-static void cinepak_decode_codebook (cvid_codebook *codebook,
-                                     int chunk_id, int size, const uint8_t *data)
-{
-    const uint8_t *eod = (data + size);
-    uint32_t flag, mask;
-    int      i, n;
-    uint8_t *p;
-
-    /* check if this chunk contains 4- or 6-element vectors */
-    n    = (chunk_id & 0x04) ? 4 : 6;
-    flag = 0;
-    mask = 0;
-
-    p = codebook[0];
-    for (i=0; i < 256; i++) {
-        if ((chunk_id & 0x01) && !(mask >>= 1)) {
-            if ((data + 4) > eod)
-                break;
-
-            flag  = AV_RB32 (data);
-            data += 4;
-            mask  = 0x80000000;
-        }
+    void (*decode_codebook)(CinepakContext *s, cvid_codebook *codebook,
+                            int chunk_id, int size, const uint8_t *data);
+    int  (*decode_vectors)(CinepakContext *s, cvid_strip *strip,
+                           int chunk_id, int size, const uint8_t *data);
+/* options */
+    enum AVPixelFormat out_pixfmt;
+};
 
-        if (!(chunk_id & 0x01) || (flag & mask)) {
-            int k, kk;
+#define OFFSET(x) offsetof(CinepakContext, x)
+#define VD AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_DECODING_PARAM
+static const AVOption options[] = {
+{"output_pixel_format", "set output pixel format as rgb24 or rgb565", OFFSET(out_pixfmt), AV_OPT_TYPE_PIXEL_FMT, {.i64=AV_PIX_FMT_NONE}, -1, INT_MAX, VD },
+    { NULL },
+};
 
-            if ((data + n) > eod)
-                break;
+static const AVClass cinepak_class = {
+    .class_name = "cinepak decoder",
+    .item_name  = av_default_item_name,
+    .option     = options,
+    .version    = LIBAVUTIL_VERSION_INT,
+};
 
-            for (k = 0; k < 4; ++k) {
-                int r = *data++;
-                for (kk = 0; kk < 3; ++kk)
-                    *p++ = r;
-            }
-            if (n == 6) {
-                int r, g, b, u, v;
-                u = *(int8_t *)data++;
-                v = *(int8_t *)data++;
-                p -= 12;
+#define CODEBOOK_PROLOGUE(pixel_format) \
+static void cinepak_decode_codebook_##pixel_format (CinepakContext *s,\
+    cvid_codebook *codebook, int chunk_id, int size, const uint8_t *data) {\
+    const uint8_t *eod;\
+    uint32_t flag, mask;\
+    int selective_update, i;\
+
+#define DECODE_CODEBOOK(pixel_format) \
+CODEBOOK_PROLOGUE(pixel_format) \
+    int n, palette_video;\
+
+#define CODEBOOK_STREAM_PARSING \
+    for (i=0; i < 256; i++) {\
+        if (selective_update && !(mask >>= 1)) {\
+            if ((data + 4) > eod) break;\
+            flag = AV_RB32 (data); data += 4; mask = 0x80000000;\
+        }\
+        if (!selective_update || (flag & mask)) {\
+            int k;\
+            if ((data + n) > eod) break;\
+
+#define CODEBOOK_INTRO \
+    selective_update = (chunk_id & 0x01);\
+    eod = (data + size); flag = mask = 0;\
+
+#define CODEBOOK_FULL_COLOR \
+    /* check if this chunk contains 4- or 6-element vectors */\
+    n = (chunk_id & 0x04) ? 4 : 6;\
+    palette_video = s->palette_video;\
+    CODEBOOK_INTRO\
+    CODEBOOK_STREAM_PARSING\
+
+#define DECODE_VECTORS(pixel_format) \
+static int cinepak_decode_vectors_##pixel_format (CinepakContext *s, cvid_strip *strip, int chunk_id, int size, const uint8_t *data) {\
+    const uint8_t   *eod;\
+    uint32_t         flag, mask;\
+    int              x, y, selective_update, v1_only;\
+    char            *ip0, *ip1, *ip2, *ip3;\
+
+#define VECTOR_INTRO \
+    CODEBOOK_INTRO\
+    v1_only = (chunk_id & 0x02);\
+    for (y=strip->y1; y < strip->y2; y+=4) {\
+
+#define VECTOR_STREAM_PARSING \
+        for (x=strip->x1; x < strip->x2; x+=4) {\
+            if (selective_update && !(mask >>= 1)) {\
+                if ((data + 4) > eod) return AVERROR_INVALIDDATA;\
+                flag  = AV_RB32 (data); data += 4; mask = 0x80000000;\
+            }\
+            if (!selective_update || (flag & mask)) {\
+                if (!v1_only && !(mask >>= 1)) {\
+                    if ((data + 4) > eod) return AVERROR_INVALIDDATA;\
+                    flag  = AV_RB32 (data); data += 4; mask = 0x80000000;\
+                }\
+                if (v1_only || (~flag & mask)) {\
+                    POINTER_TYPE *p;\
+                    if (data >= eod) return AVERROR_INVALIDDATA;\
+
+#define VECTOR_DO \
+/* take care of y dimension not being multiple of 4 */\
+        if(s->avctx->height - y > 1) {\
+            ip1 = ip0 + s->frame->linesize[0];\
+            if(s->avctx->height - y > 2) {\
+                ip2 = ip1 + s->frame->linesize[0];\
+                if(s->avctx->height - y > 3) {\
+                    ip3 = ip2 + s->frame->linesize[0];\
+                }\
+            }\
+        }\
+/* to get the correct picture for not-multiple-of-4 cases let us fill each\
+ * block from the bottom up, thus possibly overwriting the bottommost line\
+ * more than once but ending with the correct data in place\
+ * (instead of in-loop checking) */\
+        VECTOR_STREAM_PARSING\
+
+DECODE_CODEBOOK(rgb24)
+    uint8_t *p = codebook->rgb24[0];
+    CODEBOOK_FULL_COLOR
+            if (n == 4)
+                if (palette_video)
+                    for (k = 0; k < 4; ++k) {
+                        uint32_t r = s->pal[*data++];
+                        *p++ = (r>>16)&0xff; *p++ = (r>>8)&0xff; *p++ = r&0xff;
+                    }
+                else
+                    for (k = 0; k < 4; ++k) {
+                        int kk, r = *data++;
+                        for (kk = 0; kk < 3; ++kk) *p++ = r;
+                    }
+            else { /* n == 6 */
+                int y, u, v, v2, u2v, u2;
+                u = (int8_t)data[4]; v = (int8_t)data[5];
+                v2 = v*2; u2v = u/2+v; u2 = u*2;
                 for(k=0; k<4; ++k) {
-                    r = *p++ + v*2;
-                    g = *p++ - (u/2) - v;
-                    b = *p   + u*2;
-                    p -= 2;
-                    *p++ = av_clip_uint8(r);
-                    *p++ = av_clip_uint8(g);
-                    *p++ = av_clip_uint8(b);
+                    y = *data++;
+                    *p++ = av_clip_uint8(y+v2);
+                    *p++ = av_clip_uint8(y-u2v);
+                    *p++ = av_clip_uint8(y+u2);
                 }
+                data += 2;
             }
-        } else {
+        } else
             p += 12;
-        }
     }
 }
-
-static int cinepak_decode_vectors (CinepakContext *s, cvid_strip *strip,
-                                   int chunk_id, int size, const uint8_t *data)
-{
-    const uint8_t   *eod = (data + size);
-    uint32_t         flag, mask;
-    uint8_t         *cb0, *cb1, *cb2, *cb3;
-    int             x, y;
-    char            *ip0, *ip1, *ip2, *ip3;
-
-    flag = 0;
-    mask = 0;
-
-    for (y=strip->y1; y < strip->y2; y+=4) {
-
-/* take care of y dimension not being multiple of 4, such streams exist */
+DECODE_VECTORS(rgb24)
+    uint8_t *cb0, *cb1, *cb2, *cb3;
+    VECTOR_INTRO
         ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
-          (s->palette_video?strip->x1:strip->x1*3) + (y * s->frame->linesize[0]);
-        if(s->avctx->height - y > 1) {
-            ip1 = ip0 + s->frame->linesize[0];
-            if(s->avctx->height - y > 2) {
-                ip2 = ip1 + s->frame->linesize[0];
-                if(s->avctx->height - y > 3) {
-                    ip3 = ip2 + s->frame->linesize[0];
+                                strip->x1*3 + y*s->frame->linesize[0];
+#define POINTER_TYPE uint8_t
+        VECTOR_DO
+#undef POINTER_TYPE
+                    p = strip->v1_codebook.rgb24[*data++] + 6;
+                    memcpy(ip3 + 0, p, 3); memcpy(ip3 + 3, p, 3);
+                    memcpy(ip2 + 0, p, 3); memcpy(ip2 + 3, p, 3);
+                    p += 3; /* ... + 9 */
+                    memcpy(ip3 + 6, p, 3); memcpy(ip3 + 9, p, 3);
+                    memcpy(ip2 + 6, p, 3); memcpy(ip2 + 9, p, 3);
+                    p -= 9; /* ... + 0 */
+                    memcpy(ip1 + 0, p, 3); memcpy(ip1 + 3, p, 3);
+                    memcpy(ip0 + 0, p, 3); memcpy(ip0 + 3, p, 3);
+                    p += 3; /* ... + 3 */
+                    memcpy(ip1 + 6, p, 3); memcpy(ip1 + 9, p, 3);
+                    memcpy(ip0 + 6, p, 3); memcpy(ip0 + 9, p, 3);
+                } else if (flag & mask) {
+                    if ((data + 4) > eod) return AVERROR_INVALIDDATA;
+                    cb0 = strip->v4_codebook.rgb24[*data++];
+                    cb1 = strip->v4_codebook.rgb24[*data++];
+                    cb2 = strip->v4_codebook.rgb24[*data++];
+                    cb3 = strip->v4_codebook.rgb24[*data++];
+                    memcpy(ip3 + 0, cb2 + 6, 6); memcpy(ip3 + 6, cb3 + 6, 6);
+                    memcpy(ip2 + 0, cb2 + 0, 6); memcpy(ip2 + 6, cb3 + 0, 6);
+                    memcpy(ip1 + 0, cb0 + 6, 6); memcpy(ip1 + 6, cb1 + 6, 6);
+                    memcpy(ip0 + 0, cb0 + 0, 6); memcpy(ip0 + 6, cb1 + 0, 6);
                 }
             }
+            ip0 += 12; ip1 += 12; ip2 += 12; ip3 += 12;
         }
-/* to get the correct picture for not-multiple-of-4 cases let us fill each
- * block from the bottom up, thus possibly overwriting the bottommost line
- * more than once but ending with the correct data in place
- * (instead of in-loop checking) */
-
-        for (x=strip->x1; x < strip->x2; x+=4) {
-            if ((chunk_id & 0x01) && !(mask >>= 1)) {
-                if ((data + 4) > eod)
-                    return AVERROR_INVALIDDATA;
-
-                flag  = AV_RB32 (data);
-                data += 4;
-                mask  = 0x80000000;
-            }
-
-            if (!(chunk_id & 0x01) || (flag & mask)) {
-                if (!(chunk_id & 0x02) && !(mask >>= 1)) {
-                    if ((data + 4) > eod)
-                        return AVERROR_INVALIDDATA;
-
-                    flag  = AV_RB32 (data);
-                    data += 4;
-                    mask  = 0x80000000;
-                }
-
-                if ((chunk_id & 0x02) || (~flag & mask)) {
-                    uint8_t *p;
-                    if (data >= eod)
-                        return AVERROR_INVALIDDATA;
-
-                    p = strip->v1_codebook[*data++];
-                    if (s->palette_video) {
-                        ip3[0] = ip3[1] = ip2[0] = ip2[1] = p[6];
-                        ip3[2] = ip3[3] = ip2[2] = ip2[3] = p[9];
-                        ip1[0] = ip1[1] = ip0[0] = ip0[1] = p[0];
-                        ip1[2] = ip1[3] = ip0[2] = ip0[3] = p[3];
-                    } else {
-                        p += 6;
-                        memcpy(ip3 + 0, p, 3); memcpy(ip3 + 3, p, 3);
-                        memcpy(ip2 + 0, p, 3); memcpy(ip2 + 3, p, 3);
-                        p += 3; /* ... + 9 */
-                        memcpy(ip3 + 6, p, 3); memcpy(ip3 + 9, p, 3);
-                        memcpy(ip2 + 6, p, 3); memcpy(ip2 + 9, p, 3);
-                        p -= 9; /* ... + 0 */
-                        memcpy(ip1 + 0, p, 3); memcpy(ip1 + 3, p, 3);
-                        memcpy(ip0 + 0, p, 3); memcpy(ip0 + 3, p, 3);
-                        p += 3; /* ... + 3 */
-                        memcpy(ip1 + 6, p, 3); memcpy(ip1 + 9, p, 3);
-                        memcpy(ip0 + 6, p, 3); memcpy(ip0 + 9, p, 3);
+    }
+    return 0;
+}
+#define PACK_RGB_RGB565(r,g,b) (((av_clip_uint8((r)+4)>>3)<<11)|((av_clip_uint8((g)+2)>>2)<<5)|(av_clip_uint8((b)+4)>>3)) /* rounding to nearest */
+DECODE_CODEBOOK(rgb565)
+    uint16_t *p = codebook->rgb565[0];
+    CODEBOOK_FULL_COLOR
+            if (n == 4)
+                if (palette_video)
+                    for (k = 0; k < 4; ++k) {
+                        uint32_t r = s->pal[*data++];
+                        *p++ = PACK_RGB_RGB565((r>>16)&0xff,(r>>8)&0xff,r&0xff);
                     }
-
+                else
+                    for (k = 0; k < 4; ++k) {
+                        int r = *data++;
+                        *p++ = PACK_RGB_RGB565(r,r,r);
+                    }
+            else { /* n == 6 */
+                int y, u, v, v2, u2v, u2;
+                u = (int8_t)data[4]; v = (int8_t)data[5];
+                v2 = v*2; u2v = u/2+v; u2 = u*2;
+                for(k=0; k<4; ++k) {
+                    y = *data++;
+                    *p++ = PACK_RGB_RGB565(y+v2,y-u2v,y+u2);
+                }
+                data += 2;
+            }
+        } else
+            p += 4;
+    }
+}
+DECODE_VECTORS(rgb565)
+    uint16_t *cb0, *cb1, *cb2, *cb3;
+    VECTOR_INTRO
+        ip0 = ip1 = ip2 = ip3 = s->frame->data[0] +
+                                strip->x1*2 + y*s->frame->linesize[0];
+#define POINTER_TYPE uint16_t
+        VECTOR_DO
+#undef POINTER_TYPE
+                    p = strip->v1_codebook.rgb565[*data++];
+                    * (uint16_t *)ip3    = *((uint16_t *)ip3+1) =
+                    * (uint16_t *)ip2    = *((uint16_t *)ip2+1) = p[2];
+                    *((uint16_t *)ip3+2) = *((uint16_t *)ip3+3) =
+                    *((uint16_t *)ip2+2) = *((uint16_t *)ip2+3) = p[3];
+                    * (uint16_t *)ip1    = *((uint16_t *)ip1+1) =
+                    * (uint16_t *)ip0    = *((uint16_t *)ip0+1) = p[0];
+                    *((uint16_t *)ip1+2) = *((uint16_t *)ip1+3) =
+                    *((uint16_t *)ip0+2) = *((uint16_t *)ip0+3) = p[1];
                 } else if (flag & mask) {
                     if ((data + 4) > eod)
                         return AVERROR_INVALIDDATA;
-
-                    cb0 = strip->v4_codebook[*data++];
-                    cb1 = strip->v4_codebook[*data++];
-                    cb2 = strip->v4_codebook[*data++];
-                    cb3 = strip->v4_codebook[*data++];
-                    if (s->palette_video) {
-                        uint8_t *p;
-                        p = ip3;
-                        *p++ = cb2[6];
-                        *p++ = cb2[9];
-                        *p++ = cb3[6];
-                        *p   = cb3[9];
-                        p = ip2;
-                        *p++ = cb2[0];
-                        *p++ = cb2[3];
-                        *p++ = cb3[0];
-                        *p   = cb3[3];
-                        p = ip1;
-                        *p++ = cb0[6];
-                        *p++ = cb0[9];
-                        *p++ = cb1[6];
-                        *p   = cb1[9];
-                        p = ip0;
-                        *p++ = cb0[0];
-                        *p++ = cb0[3];
-                        *p++ = cb1[0];
-                        *p   = cb1[3];
-                    } else {
-                        memcpy(ip3 + 0, cb2 + 6, 6);
-                        memcpy(ip3 + 6, cb3 + 6, 6);
-                        memcpy(ip2 + 0, cb2 + 0, 6);
-                        memcpy(ip2 + 6, cb3 + 0, 6);
-                        memcpy(ip1 + 0, cb0 + 6, 6);
-                        memcpy(ip1 + 6, cb1 + 6, 6);
-                        memcpy(ip0 + 0, cb0 + 0, 6);
-                        memcpy(ip0 + 6, cb1 + 0, 6);
-                    }
-
+                    cb0 = strip->v4_codebook.rgb565[*data++];
+                    cb1 = strip->v4_codebook.rgb565[*data++];
+                    cb2 = strip->v4_codebook.rgb565[*data++];
+                    cb3 = strip->v4_codebook.rgb565[*data++];
+                    memcpy(ip3 + 0, cb2 + 2, 4); memcpy(ip3 + 4, cb3 + 2, 4);
+                    memcpy(ip2 + 0, cb2 + 0, 4); memcpy(ip2 + 4, cb3 + 0, 4);
+                    memcpy(ip1 + 0, cb0 + 2, 4); memcpy(ip1 + 4, cb1 + 2, 4);
+                    memcpy(ip0 + 0, cb0 + 0, 4); memcpy(ip0 + 4, cb1 + 0, 4);
                 }
             }
-
-            if (s->palette_video) {
-                ip0 += 4;  ip1 += 4;
-                ip2 += 4;  ip3 += 4;
-            } else {
-                ip0 += 12;  ip1 += 12;
-                ip2 += 12;  ip3 += 12;
-            }
+            ip0 += 8; ip1 += 8; ip2 += 8; ip3 += 8;
         }
     }
-
     return 0;
 }
 
@@ -286,29 +329,15 @@  static int cinepak_decode_strip (CinepakContext *s,
 
         switch (chunk_id) {
 
-        case 0x20:
-        case 0x21:
-        case 0x24:
-        case 0x25:
-            cinepak_decode_codebook (strip->v4_codebook, chunk_id,
-                chunk_size, data);
+        case 0x20: case 0x21: case 0x24: case 0x25:
+            s->decode_codebook(s, &strip->v4_codebook, chunk_id, chunk_size, data);
             break;
-
-        case 0x22:
-        case 0x23:
-        case 0x26:
-        case 0x27:
-            cinepak_decode_codebook (strip->v1_codebook, chunk_id,
-                chunk_size, data);
+        case 0x22: case 0x23: case 0x26: case 0x27:
+            s->decode_codebook (s, &strip->v1_codebook, chunk_id, chunk_size, data);
             break;
-
-        case 0x30:
-        case 0x31:
-        case 0x32:
-            return cinepak_decode_vectors (s, strip, chunk_id,
-                chunk_size, data);
+        case 0x30: case 0x31: case 0x32:
+            return s->decode_vectors (s, strip, chunk_id, chunk_size, data);
         }
-
         data += chunk_size;
     }
 
@@ -385,9 +414,9 @@  static int cinepak_decode (CinepakContext *s)
         strip_size = ((s->data + strip_size) > eod) ? (eod - s->data) : strip_size;
 
         if ((i > 0) && !(frame_flags & 0x01)) {
-            memcpy (s->strips[i].v4_codebook, s->strips[i-1].v4_codebook,
+            memcpy (&s->strips[i].v4_codebook, &s->strips[i-1].v4_codebook,
                 sizeof(s->strips[i].v4_codebook));
-            memcpy (s->strips[i].v1_codebook, s->strips[i-1].v1_codebook,
+            memcpy (&s->strips[i].v1_codebook, &s->strips[i-1].v1_codebook,
                 sizeof(s->strips[i].v1_codebook));
         }
 
@@ -402,6 +431,10 @@  static int cinepak_decode (CinepakContext *s)
     return 0;
 }
 
+static const enum AVPixelFormat pixfmt_list[] = {
+    AV_PIX_FMT_RGB24, AV_PIX_FMT_RGB565, AV_PIX_FMT_NONE
+};
+
 static av_cold int cinepak_decode_init(AVCodecContext *avctx)
 {
     CinepakContext *s = avctx->priv_data;
@@ -412,15 +445,26 @@  static av_cold int cinepak_decode_init(AVCodecContext *avctx)
 
     s->sega_film_skip_bytes = -1;  /* uninitialized state */
 
-    // check for paletted data
-    if (avctx->bits_per_coded_sample != 8) {
-        s->palette_video = 0;
-        avctx->pix_fmt = AV_PIX_FMT_RGB24;
-    } else {
-        s->palette_video = 1;
-        avctx->pix_fmt = AV_PIX_FMT_PAL8;
+    /* check for paletted data */
+    s->palette_video = (avctx->bits_per_coded_sample == 8);
+
+    if (s->out_pixfmt != AV_PIX_FMT_NONE) /* the option is set to something */
+        avctx->pix_fmt = s->out_pixfmt;
+    else
+        avctx->pix_fmt = ff_get_format(avctx, pixfmt_list);
+
+#define DECODE_TO(pixel_format) \
+ s->decode_codebook = cinepak_decode_codebook_##pixel_format;\
+ s->decode_vectors  = cinepak_decode_vectors_##pixel_format;\
+ break;\
+
+    switch (avctx->pix_fmt) {
+    case AV_PIX_FMT_RGB24:   DECODE_TO(rgb24)
+    case AV_PIX_FMT_RGB565:  DECODE_TO(rgb565)
+    default:
+        av_log(avctx, AV_LOG_ERROR, "Unsupported pixel format %s\n", av_get_pix_fmt_name(avctx->pix_fmt));
+        return AVERROR(EINVAL);
     }
-
     s->frame = av_frame_alloc();
     if (!s->frame)
         return AVERROR(ENOMEM);
@@ -457,9 +501,6 @@  static int cinepak_decode_frame(AVCodecContext *avctx,
         av_log(avctx, AV_LOG_ERROR, "cinepak_decode failed\n");
     }
 
-    if (s->palette_video)
-        memcpy (s->frame->data[1], s->pal, AVPALETTE_SIZE);
-
     if ((ret = av_frame_ref(data, s->frame)) < 0)
         return ret;
 
@@ -488,4 +529,5 @@  AVCodec ff_cinepak_decoder = {
     .close          = cinepak_decode_end,
     .decode         = cinepak_decode_frame,
     .capabilities   = AV_CODEC_CAP_DR1,
+    .priv_class     = &cinepak_class,
 };