[FFmpeg-devel,v2] libavutil: add an FFT & MDCT implementation

Submitted by Lynne on May 10, 2019, 3:14 p.m.

Details

Message ID LeXIvak--3-1@lynne.ee
State New
Headers show

Commit Message

Lynne May 10, 2019, 3:14 p.m.
Patch updated again.
Made some more cleanups to the transforms, the tables and the main context.
API changed again, now the init function populates the function pointer for transform.
I decided that having a separate function would encourage bad usage (e.g. calling
the function every time before doing a transform rather than storing the pointer) when
we're trying to avoid the overhead of function calls.
Also adjusted file names to match the API.

Comments

Carl Eugen Hoyos May 12, 2019, 7:57 p.m.
Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
>
> Patch updated again.
> Made some more cleanups to the transforms, the tables and the main context.
> API changed again, now the init function populates the function pointer for transform.
> I decided that having a separate function would encourage bad usage (e.g. calling
> the function every time before doing a transform rather than storing the pointer) when
> we're trying to avoid the overhead of function calls.
> Also adjusted file names to match the API.

As said, this patch is not ok as long as the copyright statements are missing.

Carl Eugen
Paul B Mahol May 12, 2019, 8:37 p.m.
On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
>>
>> Patch updated again.
>> Made some more cleanups to the transforms, the tables and the main
>> context.
>> API changed again, now the init function populates the function pointer
>> for transform.
>> I decided that having a separate function would encourage bad usage (e.g.
>> calling
>> the function every time before doing a transform rather than storing the
>> pointer) when
>> we're trying to avoid the overhead of function calls.
>> Also adjusted file names to match the API.
>
> As said, this patch is not ok as long as the copyright statements are
> missing.
>

You are mislead. No statements are necessary.
Carl Eugen Hoyos May 12, 2019, 8:38 p.m.
Am So., 12. Mai 2019 um 22:37 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
>
> On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> > Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
> >>
> >> Patch updated again.
> >> Made some more cleanups to the transforms, the tables and the main
> >> context.
> >> API changed again, now the init function populates the function pointer
> >> for transform.
> >> I decided that having a separate function would encourage bad usage (e.g.
> >> calling
> >> the function every time before doing a transform rather than storing the
> >> pointer) when
> >> we're trying to avoid the overhead of function calls.
> >> Also adjusted file names to match the API.
> >
> > As said, this patch is not ok as long as the copyright statements are
> > missing.
> >
>
> You are mislead. No statements are necessary.

Why do you think so?

The commit message states that a part of the code is coming
from a file with an extensive copyright statement: Why would
it be ok to remove it?

Carl Eugen
Paul B Mahol May 12, 2019, 8:40 p.m.
On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am So., 12. Mai 2019 um 22:37 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
>>
>> On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> > Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
>> >>
>> >> Patch updated again.
>> >> Made some more cleanups to the transforms, the tables and the main
>> >> context.
>> >> API changed again, now the init function populates the function pointer
>> >> for transform.
>> >> I decided that having a separate function would encourage bad usage
>> >> (e.g.
>> >> calling
>> >> the function every time before doing a transform rather than storing
>> >> the
>> >> pointer) when
>> >> we're trying to avoid the overhead of function calls.
>> >> Also adjusted file names to match the API.
>> >
>> > As said, this patch is not ok as long as the copyright statements are
>> > missing.
>> >
>>
>> You are mislead. No statements are necessary.
>
> Why do you think so?
>
> The commit message states that a part of the code is coming
> from a file with an extensive copyright statement: Why would
> it be ok to remove it?
>

Are you  sure white-space is copyright-able?
Carl Eugen Hoyos May 12, 2019, 8:58 p.m.
Am So., 12. Mai 2019 um 22:40 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
>
> On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> > Am So., 12. Mai 2019 um 22:37 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
> >>
> >> On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >> > Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
> >> >>
> >> >> Patch updated again.
> >> >> Made some more cleanups to the transforms, the tables and the main
> >> >> context.
> >> >> API changed again, now the init function populates the function pointer
> >> >> for transform.
> >> >> I decided that having a separate function would encourage bad usage
> >> >> (e.g.
> >> >> calling
> >> >> the function every time before doing a transform rather than storing
> >> >> the
> >> >> pointer) when
> >> >> we're trying to avoid the overhead of function calls.
> >> >> Also adjusted file names to match the API.
> >> >
> >> > As said, this patch is not ok as long as the copyright statements are
> >> > missing.
> >> >
> >>
> >> You are mislead. No statements are necessary.
> >
> > Why do you think so?
> >
> > The commit message states that a part of the code is coming
> > from a file with an extensive copyright statement: Why would
> > it be ok to remove it?
>
> Are you  sure white-space is copyright-able?

I am sure that white-space is generally not copyrightable but
Lynne claimed he copied actual code from libavcodec.

Carl Eugen
Hendrik Leppkes May 12, 2019, 8:59 p.m.
On Sun, May 12, 2019 at 10:38 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> Am So., 12. Mai 2019 um 22:37 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
> >
> > On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> > > Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
> > >>
> > >> Patch updated again.
> > >> Made some more cleanups to the transforms, the tables and the main
> > >> context.
> > >> API changed again, now the init function populates the function pointer
> > >> for transform.
> > >> I decided that having a separate function would encourage bad usage (e.g.
> > >> calling
> > >> the function every time before doing a transform rather than storing the
> > >> pointer) when
> > >> we're trying to avoid the overhead of function calls.
> > >> Also adjusted file names to match the API.
> > >
> > > As said, this patch is not ok as long as the copyright statements are
> > > missing.
> > >
> >
> > You are mislead. No statements are necessary.
>
> Why do you think so?
>
> The commit message states that a part of the code is coming
> from a file with an extensive copyright statement: Why would
> it be ok to remove it?
>

The names at the top of the file are always going to be inaccurate,
and as such meaningless, because everyone that changed the file in a
meaningful way holds a copyright over those changes, and not everyone
is added to that list (typically, no-one beyond the original author is
present there).
Since the list is not complete, and as such does not allow you to
identify who actually holds all the copyright in such a file, its not
legally relevant.

Everyone of the authors still hold a copyright no matter if the name
is present there, or not.

- Hendrik
Carl Eugen Hoyos May 12, 2019, 9:04 p.m.
Am So., 12. Mai 2019 um 22:59 Uhr schrieb Hendrik Leppkes <h.leppkes@gmail.com>:
>
> On Sun, May 12, 2019 at 10:38 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >
> > Am So., 12. Mai 2019 um 22:37 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
> > >
> > > On 5/12/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> > > > Am Fr., 10. Mai 2019 um 17:15 Uhr schrieb Lynne <dev@lynne.ee>:
> > > >>
> > > >> Patch updated again.
> > > >> Made some more cleanups to the transforms, the tables and the main
> > > >> context.
> > > >> API changed again, now the init function populates the function pointer
> > > >> for transform.
> > > >> I decided that having a separate function would encourage bad usage (e.g.
> > > >> calling
> > > >> the function every time before doing a transform rather than storing the
> > > >> pointer) when
> > > >> we're trying to avoid the overhead of function calls.
> > > >> Also adjusted file names to match the API.
> > > >
> > > > As said, this patch is not ok as long as the copyright statements are
> > > > missing.
> > > >
> > >
> > > You are mislead. No statements are necessary.
> >
> > Why do you think so?
> >
> > The commit message states that a part of the code is coming
> > from a file with an extensive copyright statement: Why would
> > it be ok to remove it?
>
> The names at the top of the file are always going to be inaccurate,

Possibly.

> and as such meaningless,

(Impossible to parse)

> because everyone that changed the file in a
> meaningful way holds a copyright over those changes,

Of course!

> and not everyone is added to that list

True.

> (typically, no-one beyond the original author is present there).

Depending on the meaning of "typically" this may be true
(but is probably "meaningless").

> Since the list is not complete, and as such does not allow you to
> identify who actually holds all the copyright in such a file, its not
> legally relevant.

This is - to quote a Viennese scientist - not even wrong;-)

But seriously: We are of course not allowed to remove copyright
statements, no matter if we consider them relevant or not.

> Everyone of the authors still hold a copyright no matter if the name
> is present there, or not.

Yes, and this is the reason why we must not remove the names
of those present (they also hold a copyright, not just those whose
names are missing).

Carl Eugen
Hendrik Leppkes May 12, 2019, 9:10 p.m.
On Sun, May 12, 2019 at 11:05 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>
> But seriously: We are of course not allowed to remove copyright
> statements, no matter if we consider them relevant or not.
>

Please provide a source for such claims.

- Hendrik
Lynne May 12, 2019, 9:58 p.m.
May 12, 2019, 10:04 PM by ceffmpeg@gmail.com:

>
> Yes, and this is the reason why we must not remove the names
> of those present (they also hold a copyright, not just those whose
> names are missing).
>
>

Please go and discuss the project and the world's attitude and policy towards copyright
notices in another thread.

I need *technical* feedback about the API. I would also like to have someone take a quick
look at the implementation, although I'm confident its okay and as optimal as a generic
software implementation could be. Accuracy-wise its slightly better than libfftw3f, due to
the use of reordering for nptwo transform direction and the compound 15-point FFT
(less operations due to the lack of inter-stage twiddles).
Can provide a benchmarking test program on IRC.
Carl Eugen Hoyos May 12, 2019, 10:42 p.m.
Am So., 12. Mai 2019 um 23:58 Uhr schrieb Lynne <dev@lynne.ee>:
> I need *technical* feedback about the API.

I understand that.

Please understand that the patch cannot be committed as-is.

Carl Eugen
James Almer May 12, 2019, 10:47 p.m.
On 5/12/2019 7:42 PM, Carl Eugen Hoyos wrote:
> Am So., 12. Mai 2019 um 23:58 Uhr schrieb Lynne <dev@lynne.ee>:
>> I need *technical* feedback about the API.
> 
> I understand that.

Then, if you can't provide technical feedback, please stop replying to
this thread (After you provide the source Hendrik requested).
Carl Eugen Hoyos May 12, 2019, 11:01 p.m.
Am Mo., 13. Mai 2019 um 00:55 Uhr schrieb James Almer <jamrial@gmail.com>:
>
> On 5/12/2019 7:42 PM, Carl Eugen Hoyos wrote:
> > Am So., 12. Mai 2019 um 23:58 Uhr schrieb Lynne <dev@lynne.ee>:
> >> I need *technical* feedback about the API.
> >
> > I understand that.
>
> Then, if you can't provide technical feedback, please stop replying
> to this thread (After you provide the source Hendrik requested).

Could you please stop this?

It doesn't matter where you live, and it doesn't matter which license
you use, you are not allowed to remove a copyright statement that
was put on top of a source file.

It is arguably not always as clear as in this case, but since it was
explained where the code comes from, there is really no question
about this.

Carl Eugen
Pedro Arthur May 13, 2019, 2:54 a.m.
Em dom, 12 de mai de 2019 às 18:11, Hendrik Leppkes
<h.leppkes@gmail.com> escreveu:
>
> On Sun, May 12, 2019 at 11:05 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >
> > But seriously: We are of course not allowed to remove copyright
> > statements, no matter if we consider them relevant or not.
> >
>
> Please provide a source for such claims.
The GPL license included in the header states that.

GPL2 [1] - "keep intact all the notices that refer to this License"
GPL3 [2]  - "Requiring preservation of specified reasonable legal
notices or author attributions in that material"
MIT [3] (for completeness) - "The above copyright notice and this
permission notice shall be included in all copies or substantial
portions of the Software."

[1] - https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
[2] - http://www.gnu.org/licenses/gpl.html
[3] - https://opensource.org/licenses/MIT
Reimar Döffinger May 13, 2019, 6:52 a.m.
On 13.05.2019, at 04:54, Pedro Arthur <bygrandao@gmail.com> wrote:

> Em dom, 12 de mai de 2019 às 18:11, Hendrik Leppkes
> <h.leppkes@gmail.com> escreveu:
>> 
>> On Sun, May 12, 2019 at 11:05 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>>> 
>>> But seriously: We are of course not allowed to remove copyright
>>> statements, no matter if we consider them relevant or not.
>>> 
>> 
>> Please provide a source for such claims.
> The GPL license included in the header states that.
> 
> GPL2 [1] - "keep intact all the notices that refer to this License"
> GPL3 [2]  - "Requiring preservation of specified reasonable legal
> notices or author attributions in that material"
> MIT [3] (for completeness) - "The above copyright notice and this
> permission notice shall be included in all copies or substantial
> portions of the Software."
> 
> [1] - https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
> [2] - http://www.gnu.org/licenses/gpl.html
> [3] - https://opensource.org/licenses/MI

Besides the direct legals and ethics of it, please note that it also creates a
serious (even if unlikely) risk to us and our users.
In any kind of legal case, whether to defend against some "copyright troll" or to
enforce the license it might become necessary to find the author of the code.
Not properly taking care of the license and copyright statement side exposes our
users to unnecessary risk and makes it harder to enforce our license.
Even if nothing happens, the work companies have to do to show compliance
(because their customers require it or to reduce their risk) becomes much harder.
Which is why the Linux kernel currently is working on cleaning up their license
header mess.

Best regards,
Reimar Döffinger
Hendrik Leppkes May 13, 2019, 8:40 a.m.
On Mon, May 13, 2019 at 8:57 AM Reimar Döffinger
<Reimar.Doeffinger@gmx.de> wrote:
>
> On 13.05.2019, at 04:54, Pedro Arthur <bygrandao@gmail.com> wrote:
>
> > Em dom, 12 de mai de 2019 às 18:11, Hendrik Leppkes
> > <h.leppkes@gmail.com> escreveu:
> >>
> >> On Sun, May 12, 2019 at 11:05 PM Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> >>>
> >>> But seriously: We are of course not allowed to remove copyright
> >>> statements, no matter if we consider them relevant or not.
> >>>
> >>
> >> Please provide a source for such claims.
> > The GPL license included in the header states that.
> >
> > GPL2 [1] - "keep intact all the notices that refer to this License"
> > GPL3 [2]  - "Requiring preservation of specified reasonable legal
> > notices or author attributions in that material"
> > MIT [3] (for completeness) - "The above copyright notice and this
> > permission notice shall be included in all copies or substantial
> > portions of the Software."
> >
> > [1] - https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
> > [2] - http://www.gnu.org/licenses/gpl.html
> > [3] - https://opensource.org/licenses/MI
>
> Besides the direct legals and ethics of it, please note that it also creates a
> serious (even if unlikely) risk to us and our users.
> In any kind of legal case, whether to defend against some "copyright troll" or to
> enforce the license it might become necessary to find the author of the code.
> Not properly taking care of the license and copyright statement side exposes our
> users to unnecessary risk and makes it harder to enforce our license.
> Even if nothing happens, the work companies have to do to show compliance
> (because their customers require it or to reduce their risk) becomes much harder.
> Which is why the Linux kernel currently is working on cleaning up their license
> header mess.
>

This argument is not really applicable because the headers do not
actually let you identify who holds all the copyright over the code in
the file. You need to go dig through years of Git history.

- Hendrik
Paul B Mahol May 13, 2019, 8:53 a.m.
On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am Mo., 13. Mai 2019 um 00:55 Uhr schrieb James Almer <jamrial@gmail.com>:
>>
>> On 5/12/2019 7:42 PM, Carl Eugen Hoyos wrote:
>> > Am So., 12. Mai 2019 um 23:58 Uhr schrieb Lynne <dev@lynne.ee>:
>> >> I need *technical* feedback about the API.
>> >
>> > I understand that.
>>
>> Then, if you can't provide technical feedback, please stop replying
>> to this thread (After you provide the source Hendrik requested).
>
> Could you please stop this?
>
> It doesn't matter where you live, and it doesn't matter which license
> you use, you are not allowed to remove a copyright statement that
> was put on top of a source file.
>
> It is arguably not always as clear as in this case, but since it was
> explained where the code comes from, there is really no question
> about this.

Do we need yet another voting for this one?
Lynne May 13, 2019, 11:23 a.m.
May 13, 2019, 9:53 AM by onemda@gmail.com:

> On 5/13/19, Carl Eugen Hoyos <> ceffmpeg@gmail.com <mailto:ceffmpeg@gmail.com>> > wrote:
>
>> Am Mo., 13. Mai 2019 um 00:55 Uhr schrieb James Almer <>> jamrial@gmail.com <mailto:jamrial@gmail.com>>> >:
>>
>>>
>>> On 5/12/2019 7:42 PM, Carl Eugen Hoyos wrote:
>>> > Am So., 12. Mai 2019 um 23:58 Uhr schrieb Lynne <>>> dev@lynne.ee <mailto:dev@lynne.ee>>>> >:
>>> >> I need *technical* feedback about the API.
>>> >
>>> > I understand that.
>>>
>>> Then, if you can't provide technical feedback, please stop replying
>>> to this thread (After you provide the source Hendrik requested).
>>>
>>
>> Could you please stop this?
>>
>> It doesn't matter where you live, and it doesn't matter which license
>> you use, you are not allowed to remove a copyright statement that
>> was put on top of a source file.
>>
>> It is arguably not always as clear as in this case, but since it was
>> explained where the code comes from, there is really no question
>> about this.
>>
>
> Do we need yet another voting for this one?
>

Please do, then this thread can be left alone and I can get some actual feedback.

I'll ignore Carl's messages for now as I agree with the others that authorship is always
preserved through git history. If he disagrees then it becomes a project-wide issue as
copyright headers have sometimes not been preserved through refactoring. I can give
examples _in_another_thread_.
Carl Eugen Hoyos May 13, 2019, 11:26 a.m.
Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:

> I'll ignore Carl's messages for now as I agree with the others that authorship is always
> preserved through git history.

This is not the question here.

> If he disagrees then it becomes a project-wide issue as
> copyright headers have sometimes not been preserved through refactoring. I can give
> examples _in_another_thread_.

Please do, I would like to fix them.

Please understand that your current patch cannot be committed, the reason
is that the push would be a copyright violation.

Carl Eugen
Paul B Mahol May 13, 2019, 11:28 a.m.
On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:
>
>> I'll ignore Carl's messages for now as I agree with the others that
>> authorship is always
>> preserved through git history.
>
> This is not the question here.
>
>> If he disagrees then it becomes a project-wide issue as
>> copyright headers have sometimes not been preserved through refactoring. I
>> can give
>> examples _in_another_thread_.
>
> Please do, I would like to fix them.
>
> Please understand that your current patch cannot be committed, the reason
> is that the push would be a copyright violation.
>

Please, stop writing nonsense.
Carl Eugen Hoyos May 13, 2019, 11:29 a.m.
Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:

> I'll ignore Carl's messages for now as I agree with the others that authorship is always
> preserved through git history.

> If he disagrees then it becomes a project-wide issue as
> copyright headers have sometimes not been preserved through refactoring.

Sorry, I forgot: Mistakes can always happen.
In this case though, we all agree that a code that was committed with a
copyright notice gets moved.

> I can give examples _in_another_thread_.

Carl Eugen
Paul B Mahol May 13, 2019, 11:30 a.m.
On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:
>
>> I'll ignore Carl's messages for now as I agree with the others that
>> authorship is always
>> preserved through git history.
>
>> If he disagrees then it becomes a project-wide issue as
>> copyright headers have sometimes not been preserved through refactoring.
>
> Sorry, I forgot: Mistakes can always happen.
> In this case though, we all agree that a code that was committed with a
> copyright notice gets moved.
>
>> I can give examples _in_another_thread_.
>

I have myriad examples.
Carl Eugen Hoyos May 13, 2019, 11:32 a.m.
Am Mo., 13. Mai 2019 um 13:31 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
>
> On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> > Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:
> >
> >> I'll ignore Carl's messages for now as I agree with the others that
> >> authorship is always
> >> preserved through git history.
> >
> >> If he disagrees then it becomes a project-wide issue as
> >> copyright headers have sometimes not been preserved through refactoring.
> >
> > Sorry, I forgot: Mistakes can always happen.
> > In this case though, we all agree that a code that was committed with a
> > copyright notice gets moved.
> >
> >> I can give examples _in_another_thread_.
>
> I have myriad examples.

Please point us to them: It is important that we fix
such issues if we get informed.

Carl Eugen
Paul B Mahol May 13, 2019, 11:34 a.m.
On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
> Am Mo., 13. Mai 2019 um 13:31 Uhr schrieb Paul B Mahol <onemda@gmail.com>:
>>
>> On 5/13/19, Carl Eugen Hoyos <ceffmpeg@gmail.com> wrote:
>> > Am Mo., 13. Mai 2019 um 13:24 Uhr schrieb Lynne <dev@lynne.ee>:
>> >
>> >> I'll ignore Carl's messages for now as I agree with the others that
>> >> authorship is always
>> >> preserved through git history.
>> >
>> >> If he disagrees then it becomes a project-wide issue as
>> >> copyright headers have sometimes not been preserved through
>> >> refactoring.
>> >
>> > Sorry, I forgot: Mistakes can always happen.
>> > In this case though, we all agree that a code that was committed with a
>> > copyright notice gets moved.
>> >
>> >> I can give examples _in_another_thread_.
>>
>> I have myriad examples.
>
> Please point us to them: It is important that we fix
> such issues if we get informed.
>

See all filters.
Paul B Mahol May 14, 2019, 4:55 p.m.
On 5/10/19, Lynne <dev@lynne.ee> wrote:
> Patch updated again.
> Made some more cleanups to the transforms, the tables and the main context.
> API changed again, now the init function populates the function pointer for
> transform.
> I decided that having a separate function would encourage bad usage (e.g.
> calling
> the function every time before doing a transform rather than storing the
> pointer) when
> we're trying to avoid the overhead of function calls.
> Also adjusted file names to match the API.
>
>

LGTM, going to apply soon.

Patch hide | download patch | download mbox

From 1bc43f2aa7d4dd08468e656c5796055ea064a3d7 Mon Sep 17 00:00:00 2001
From: Lynne <dev@lynne.ee>
Date: Thu, 2 May 2019 15:07:12 +0100
Subject: [PATCH v2] libavutil: add an FFT & MDCT implementation

This commit adds a new API to libavutil to allow for arbitrary transformations
on various types of data.
This is a partly new implementation, with the power of two transforms taken
from libavcodec/fft_template, the 5 and 15-point FFT taken from mdct15, while
the 3-point FFT was written from scratch.
The (i)mdct folding code is taken from mdct15 as well, as the mdct_template
code was somewhat old, messy and not easy to separate.

A notable feature of this implementation is that it allows for 3xM and 5xM
based transforms, where M is a power of two, e.g. 384, 640, 768, 1280, etc.
AC-4 uses 3xM transforms while Siren uses 5xM transforms, so the code will
allow for decoding of such streams.
A non-exaustive list of supported sizes:
4, 8, 12, 16, 20, 24, 32, 40, 48, 60, 64, 80, 96, 120, 128, 160, 192, 240,
256, 320, 384, 480, 512, 640, 768, 960, 1024, 1280, 1536, 1920, 2048, 2560...

The API was designed such that it allows for not only 1D transforms but also
2D transforms of certain block sizes. This was partly on accident as the stride
argument is required for Opus MDCTs, but can be used in the context of a 2D
transform as well.
Also, various data types would be implemented eventually as well, such as
"double" and "int32_t".

Some performance comparisons with libfftw3f (SIMD disabled for both):
120:
  22353 decicycles in     fftwf_execute,     1024 runs,      0 skips
  21836 decicycles in compound_fft_15x8,     1024 runs,      0 skips

128:
  22003 decicycles in       fftwf_execute,   1024 runs,      0 skips
  23132 decicycles in monolithic_fft_ptwo,   1024 runs,      0 skips

384:
  75939 decicycles in      fftwf_execute,    1024 runs,      0 skips
  73973 decicycles in compound_fft_3x128,    1024 runs,      0 skips

640:
 104354 decicycles in       fftwf_execute,   1024 runs,      0 skips
 149518 decicycles in compound_fft_5x128,    1024 runs,      0 skips

768:
 109323 decicycles in      fftwf_execute,    1024 runs,      0 skips
 164096 decicycles in compound_fft_3x256,    1024 runs,      0 skips

960:
 186210 decicycles in      fftwf_execute,    1024 runs,      0 skips
 215256 decicycles in compound_fft_15x64,    1024 runs,      0 skips

1024:
 163464 decicycles in       fftwf_execute,   1024 runs,      0 skips
 199686 decicycles in monolithic_fft_ptwo,   1024 runs,      0 skips

With SIMD we should be faster than fftw for 15xM transforms as our fft15 SIMD
is around 2x faster than theirs, even if our ptwo SIMD is slightly slower.

The goal is to remove the libavcodec/mdct15 code and deprecate the
libavcodec/avfft interface once aarch64 and x86 SIMD code has been ported.
It is unlikely that libavcodec/fft will be removed soon as there's much SIMD
written for exotic or old platforms there, but nevertheless new code
should use this new API throughout the project.

The implementation passes fate when used in Opus and AAC, and the output
is identical in ATRAC9 as well.
---
 libavutil/Makefile |   2 +
 libavutil/tx.c     | 781 +++++++++++++++++++++++++++++++++++++++++++++
 libavutil/tx.h     |  86 +++++
 3 files changed, 869 insertions(+)
 create mode 100644 libavutil/tx.c
 create mode 100644 libavutil/tx.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 53208fc587..8a7a44e4b5 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -79,6 +79,7 @@  HEADERS = adler32.h                                                     \
           version.h                                                     \
           xtea.h                                                        \
           tea.h                                                         \
+          tx.h                                                          \
 
 HEADERS-$(CONFIG_LZO)                   += lzo.h
 
@@ -159,6 +160,7 @@  OBJS = adler32.o                                                        \
        xga_font_data.o                                                  \
        xtea.o                                                           \
        tea.o                                                            \
+       tx.o                                                             \
 
 OBJS-$(CONFIG_CUDA)                     += hwcontext_cuda.o
 OBJS-$(CONFIG_D3D11VA)                  += hwcontext_d3d11va.o
diff --git a/libavutil/tx.c b/libavutil/tx.c
new file mode 100644
index 0000000000..ee6e0b7ac4
--- /dev/null
+++ b/libavutil/tx.c
@@ -0,0 +1,781 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stddef.h>
+#include "fft.h"
+#include "thread.h"
+#include "mem.h"
+#include "avassert.h"
+
+typedef float FFTSample;
+typedef AVComplexFloat FFTComplex;
+
+struct AVTXContext {
+    int n;              /* Nptwo part */
+    int m;              /* Ptwo part */
+
+    FFTComplex *exptab; /* MDCT exptab */
+    FFTComplex *tmp;    /* Temporary buffer needed for all compound transforms */
+    int        *pfatab; /* Input/Output mapping for compound transforms */
+    int        *revtab; /* Input mapping for power of two transforms */
+};
+
+#define FFT_NAME(x) x
+
+#define COSTABLE(size) \
+    static DECLARE_ALIGNED(32, FFTSample, FFT_NAME(ff_cos_##size))[size/2]
+
+static FFTSample * const FFT_NAME(ff_cos_tabs)[18];
+
+COSTABLE(16);
+COSTABLE(32);
+COSTABLE(64);
+COSTABLE(128);
+COSTABLE(256);
+COSTABLE(512);
+COSTABLE(1024);
+COSTABLE(2048);
+COSTABLE(4096);
+COSTABLE(8192);
+COSTABLE(16384);
+COSTABLE(32768);
+COSTABLE(65536);
+COSTABLE(131072);
+
+static av_cold void init_ff_cos_tabs(int index)
+{
+    int m = 1 << index;
+    double freq = 2*M_PI/m;
+    FFTSample *tab = FFT_NAME(ff_cos_tabs)[index];
+    for(int i = 0; i <= m/4; i++)
+        tab[i] = cos(i*freq);
+    for(int i = 1; i < m/4; i++)
+        tab[m/2 - i] = tab[i];
+}
+
+typedef struct CosTabsInitOnce {
+    void (*func)(void);
+    AVOnce control;
+} CosTabsInitOnce;
+
+#define INIT_FF_COS_TABS_FUNC(index, size)                                     \
+static av_cold void init_ff_cos_tabs_ ## size (void)                           \
+{                                                                              \
+    init_ff_cos_tabs(index);                                                   \
+}
+
+INIT_FF_COS_TABS_FUNC(4, 16)
+INIT_FF_COS_TABS_FUNC(5, 32)
+INIT_FF_COS_TABS_FUNC(6, 64)
+INIT_FF_COS_TABS_FUNC(7, 128)
+INIT_FF_COS_TABS_FUNC(8, 256)
+INIT_FF_COS_TABS_FUNC(9, 512)
+INIT_FF_COS_TABS_FUNC(10, 1024)
+INIT_FF_COS_TABS_FUNC(11, 2048)
+INIT_FF_COS_TABS_FUNC(12, 4096)
+INIT_FF_COS_TABS_FUNC(13, 8192)
+INIT_FF_COS_TABS_FUNC(14, 16384)
+INIT_FF_COS_TABS_FUNC(15, 32768)
+INIT_FF_COS_TABS_FUNC(16, 65536)
+INIT_FF_COS_TABS_FUNC(17, 131072)
+
+static CosTabsInitOnce cos_tabs_init_once[] = {
+    { NULL },
+    { NULL },
+    { NULL },
+    { NULL },
+    { init_ff_cos_tabs_16, AV_ONCE_INIT },
+    { init_ff_cos_tabs_32, AV_ONCE_INIT },
+    { init_ff_cos_tabs_64, AV_ONCE_INIT },
+    { init_ff_cos_tabs_128, AV_ONCE_INIT },
+    { init_ff_cos_tabs_256, AV_ONCE_INIT },
+    { init_ff_cos_tabs_512, AV_ONCE_INIT },
+    { init_ff_cos_tabs_1024, AV_ONCE_INIT },
+    { init_ff_cos_tabs_2048, AV_ONCE_INIT },
+    { init_ff_cos_tabs_4096, AV_ONCE_INIT },
+    { init_ff_cos_tabs_8192, AV_ONCE_INIT },
+    { init_ff_cos_tabs_16384, AV_ONCE_INIT },
+    { init_ff_cos_tabs_32768, AV_ONCE_INIT },
+    { init_ff_cos_tabs_65536, AV_ONCE_INIT },
+    { init_ff_cos_tabs_131072, AV_ONCE_INIT },
+};
+
+static FFTSample * const FFT_NAME(ff_cos_tabs)[] = {
+    NULL, NULL, NULL, NULL,
+    FFT_NAME(ff_cos_16),
+    FFT_NAME(ff_cos_32),
+    FFT_NAME(ff_cos_64),
+    FFT_NAME(ff_cos_128),
+    FFT_NAME(ff_cos_256),
+    FFT_NAME(ff_cos_512),
+    FFT_NAME(ff_cos_1024),
+    FFT_NAME(ff_cos_2048),
+    FFT_NAME(ff_cos_4096),
+    FFT_NAME(ff_cos_8192),
+    FFT_NAME(ff_cos_16384),
+    FFT_NAME(ff_cos_32768),
+    FFT_NAME(ff_cos_65536),
+    FFT_NAME(ff_cos_131072),
+};
+
+static av_cold void ff_init_ff_cos_tabs(int index)
+{
+    ff_thread_once(&cos_tabs_init_once[index].control,
+                    cos_tabs_init_once[index].func);
+}
+
+static AVOnce tabs_53_once = AV_ONCE_INIT;
+static DECLARE_ALIGNED(32, FFTComplex, FFT_NAME(ff_53_tabs))[3];
+
+static av_cold void ff_init_53_tabs(void)
+{
+    ff_53_tabs[0] = (FFTComplex){ cos(2 * M_PI /  5), sin(2 * M_PI /  5) };
+    ff_53_tabs[1] = (FFTComplex){ cos(2 * M_PI / 10), sin(2 * M_PI / 10) };
+    ff_53_tabs[2] = (FFTComplex){ cos(2 * M_PI / 12), cos(2 * M_PI / 12) };
+}
+
+#define BF(x, y, a, b) do {                                                    \
+        x = (a) - (b);                                                         \
+        y = (a) + (b);                                                         \
+    } while (0)
+
+#define CMUL(dre, dim, are, aim, bre, bim) do {                                \
+        (dre) = (are) * (bre) - (aim) * (bim);                                 \
+        (dim) = (are) * (bim) + (aim) * (bre);                                 \
+    } while (0)
+
+#define CMUL3(c, a, b) CMUL((c).re, (c).im, (a).re, (a).im, (b).re, (b).im)
+
+static av_always_inline void fft3(FFTComplex *out, FFTComplex *in,
+                                  ptrdiff_t stride)
+{
+    FFTComplex tmp[2];
+
+    tmp[0].re = in[1].im - in[2].im;
+    tmp[0].im = in[1].re - in[2].re;
+    tmp[1].re = in[1].re + in[2].re;
+    tmp[1].im = in[1].im + in[2].im;
+
+    out[0*stride].re = in[0].re + tmp[1].re;
+    out[0*stride].im = in[0].im + tmp[1].im;
+
+    tmp[0].re *= ff_53_tabs[2].re;
+    tmp[0].im *= ff_53_tabs[2].im;
+    tmp[1].re *= 0.5f;
+    tmp[1].im *= 0.5f;
+
+    out[1*stride].re = in[0].re - tmp[1].re + tmp[0].re;
+    out[1*stride].im = in[0].im - tmp[1].im - tmp[0].im;
+    out[2*stride].re = in[0].re - tmp[1].re - tmp[0].re;
+    out[2*stride].im = in[0].im - tmp[1].im + tmp[0].im;
+}
+
+#define DECL_FFT5(NAME, D0, D1, D2, D3, D4)                                    \
+static av_always_inline void NAME(FFTComplex *out, FFTComplex *in,             \
+                                  ptrdiff_t stride)                            \
+{                                                                              \
+    FFTComplex z0[4], t[6];                                                    \
+                                                                               \
+    t[0].re = in[1].re + in[4].re;                                             \
+    t[0].im = in[1].im + in[4].im;                                             \
+    t[1].im = in[1].re - in[4].re;                                             \
+    t[1].re = in[1].im - in[4].im;                                             \
+    t[2].re = in[2].re + in[3].re;                                             \
+    t[2].im = in[2].im + in[3].im;                                             \
+    t[3].im = in[2].re - in[3].re;                                             \
+    t[3].re = in[2].im - in[3].im;                                             \
+                                                                               \
+    out[D0*stride].re = in[0].re + in[1].re + in[2].re +                       \
+                        in[3].re + in[4].re;                                   \
+    out[D0*stride].im = in[0].im + in[1].im + in[2].im +                       \
+                        in[3].im + in[4].im;                                   \
+                                                                               \
+    t[4].re = ff_53_tabs[0].re * t[2].re - ff_53_tabs[1].re * t[0].re;         \
+    t[4].im = ff_53_tabs[0].re * t[2].im - ff_53_tabs[1].re * t[0].im;         \
+    t[0].re = ff_53_tabs[0].re * t[0].re - ff_53_tabs[1].re * t[2].re;         \
+    t[0].im = ff_53_tabs[0].re * t[0].im - ff_53_tabs[1].re * t[2].im;         \
+    t[5].re = ff_53_tabs[0].im * t[3].re - ff_53_tabs[1].im * t[1].re;         \
+    t[5].im = ff_53_tabs[0].im * t[3].im - ff_53_tabs[1].im * t[1].im;         \
+    t[1].re = ff_53_tabs[0].im * t[1].re + ff_53_tabs[1].im * t[3].re;         \
+    t[1].im = ff_53_tabs[0].im * t[1].im + ff_53_tabs[1].im * t[3].im;         \
+                                                                               \
+    z0[0].re = t[0].re - t[1].re;                                              \
+    z0[0].im = t[0].im - t[1].im;                                              \
+    z0[1].re = t[4].re + t[5].re;                                              \
+    z0[1].im = t[4].im + t[5].im;                                              \
+                                                                               \
+    z0[2].re = t[4].re - t[5].re;                                              \
+    z0[2].im = t[4].im - t[5].im;                                              \
+    z0[3].re = t[0].re + t[1].re;                                              \
+    z0[3].im = t[0].im + t[1].im;                                              \
+                                                                               \
+    out[D1*stride].re = in[0].re + z0[3].re;                                   \
+    out[D1*stride].im = in[0].im + z0[0].im;                                   \
+    out[D2*stride].re = in[0].re + z0[2].re;                                   \
+    out[D2*stride].im = in[0].im + z0[1].im;                                   \
+    out[D3*stride].re = in[0].re + z0[1].re;                                   \
+    out[D3*stride].im = in[0].im + z0[2].im;                                   \
+    out[D4*stride].re = in[0].re + z0[0].re;                                   \
+    out[D4*stride].im = in[0].im + z0[3].im;                                   \
+}
+
+DECL_FFT5(fft5,     0,  1,  2,  3,  4)
+DECL_FFT5(fft5_m1,  0,  6, 12,  3,  9)
+DECL_FFT5(fft5_m2, 10,  1,  7, 13,  4)
+DECL_FFT5(fft5_m3,  5, 11,  2,  8, 14)
+
+static av_always_inline void fft15(FFTComplex *out, FFTComplex *in,
+                                   ptrdiff_t stride)
+{
+    FFTComplex tmp[15];
+
+    for (int i = 0; i < 5; i++)
+        fft3(tmp + i, in + i*3, 5);
+
+    fft5_m1(out, tmp +  0, stride);
+    fft5_m2(out, tmp +  5, stride);
+    fft5_m3(out, tmp + 10, stride);
+}
+
+#define BUTTERFLIES(a0,a1,a2,a3) {\
+    BF(t3, t5, t5, t1);\
+    BF(a2.re, a0.re, a0.re, t5);\
+    BF(a3.im, a1.im, a1.im, t3);\
+    BF(t4, t6, t2, t6);\
+    BF(a3.re, a1.re, a1.re, t4);\
+    BF(a2.im, a0.im, a0.im, t6);\
+}
+
+// force loading all the inputs before storing any.
+// this is slightly slower for small data, but avoids store->load aliasing
+// for addresses separated by large powers of 2.
+#define BUTTERFLIES_BIG(a0,a1,a2,a3) {\
+    FFTSample r0=a0.re, i0=a0.im, r1=a1.re, i1=a1.im;\
+    BF(t3, t5, t5, t1);\
+    BF(a2.re, a0.re, r0, t5);\
+    BF(a3.im, a1.im, i1, t3);\
+    BF(t4, t6, t2, t6);\
+    BF(a3.re, a1.re, r1, t4);\
+    BF(a2.im, a0.im, i0, t6);\
+}
+
+#define TRANSFORM(a0,a1,a2,a3,wre,wim) {\
+    CMUL(t1, t2, a2.re, a2.im, wre, -wim);\
+    CMUL(t5, t6, a3.re, a3.im, wre,  wim);\
+    BUTTERFLIES(a0,a1,a2,a3)\
+}
+
+#define TRANSFORM_ZERO(a0,a1,a2,a3) {\
+    t1 = a2.re;\
+    t2 = a2.im;\
+    t5 = a3.re;\
+    t6 = a3.im;\
+    BUTTERFLIES(a0,a1,a2,a3)\
+}
+
+/* z[0...8n-1], w[1...2n-1] */
+#define PASS(name)\
+static void name(FFTComplex *z, const FFTSample *wre, unsigned int n)\
+{\
+    FFTSample t1, t2, t3, t4, t5, t6;\
+    int o1 = 2*n;\
+    int o2 = 4*n;\
+    int o3 = 6*n;\
+    const FFTSample *wim = wre+o1;\
+    n--;\
+\
+    TRANSFORM_ZERO(z[0],z[o1],z[o2],z[o3]);\
+    TRANSFORM(z[1],z[o1+1],z[o2+1],z[o3+1],wre[1],wim[-1]);\
+    do {\
+        z += 2;\
+        wre += 2;\
+        wim -= 2;\
+        TRANSFORM(z[0],z[o1],z[o2],z[o3],wre[0],wim[0]);\
+        TRANSFORM(z[1],z[o1+1],z[o2+1],z[o3+1],wre[1],wim[-1]);\
+    } while(--n);\
+}
+
+PASS(pass)
+#undef BUTTERFLIES
+#define BUTTERFLIES BUTTERFLIES_BIG
+PASS(pass_big)
+
+#define DECL_FFT(n,n2,n4)\
+static void fft##n(FFTComplex *z)\
+{\
+    fft##n2(z);\
+    fft##n4(z+n4*2);\
+    fft##n4(z+n4*3);\
+    pass(z,FFT_NAME(ff_cos_##n),n4/2);\
+}
+
+static void fft4(FFTComplex *z)
+{
+    FFTSample t1, t2, t3, t4, t5, t6, t7, t8;
+
+    BF(t3, t1, z[0].re, z[1].re);
+    BF(t8, t6, z[3].re, z[2].re);
+    BF(z[2].re, z[0].re, t1, t6);
+    BF(t4, t2, z[0].im, z[1].im);
+    BF(t7, t5, z[2].im, z[3].im);
+    BF(z[3].im, z[1].im, t4, t8);
+    BF(z[3].re, z[1].re, t3, t7);
+    BF(z[2].im, z[0].im, t2, t5);
+}
+
+static void fft8(FFTComplex *z)
+{
+    FFTSample t1, t2, t3, t4, t5, t6;
+
+    fft4(z);
+
+    BF(t1, z[5].re, z[4].re, -z[5].re);
+    BF(t2, z[5].im, z[4].im, -z[5].im);
+    BF(t5, z[7].re, z[6].re, -z[7].re);
+    BF(t6, z[7].im, z[6].im, -z[7].im);
+
+    BUTTERFLIES(z[0],z[2],z[4],z[6]);
+    TRANSFORM(z[1],z[3],z[5],z[7],M_SQRT1_2,M_SQRT1_2);
+}
+
+static void fft16(FFTComplex *z)
+{
+    FFTSample t1, t2, t3, t4, t5, t6;
+    FFTSample cos_16_1 = FFT_NAME(ff_cos_16)[1];
+    FFTSample cos_16_3 = FFT_NAME(ff_cos_16)[3];
+
+    fft8(z);
+    fft4(z+8);
+    fft4(z+12);
+
+    TRANSFORM_ZERO(z[0],z[4],z[8],z[12]);
+    TRANSFORM(z[2],z[6],z[10],z[14],M_SQRT1_2,M_SQRT1_2);
+    TRANSFORM(z[1],z[5],z[9],z[13],cos_16_1,cos_16_3);
+    TRANSFORM(z[3],z[7],z[11],z[15],cos_16_3,cos_16_1);
+}
+
+DECL_FFT(32,16,8)
+DECL_FFT(64,32,16)
+DECL_FFT(128,64,32)
+DECL_FFT(256,128,64)
+DECL_FFT(512,256,128)
+#define pass pass_big
+DECL_FFT(1024,512,256)
+DECL_FFT(2048,1024,512)
+DECL_FFT(4096,2048,1024)
+DECL_FFT(8192,4096,2048)
+DECL_FFT(16384,8192,4096)
+DECL_FFT(32768,16384,8192)
+DECL_FFT(65536,32768,16384)
+DECL_FFT(131072,65536,32768)
+
+static void (* const fft_dispatch[])(FFTComplex*) = {
+    fft4, fft8, fft16, fft32, fft64, fft128, fft256, fft512, fft1024,
+    fft2048, fft4096, fft8192, fft16384, fft32768, fft65536, fft131072
+};
+
+#define DECL_COMP_FFT(N)                                                       \
+static void compound_fft_##N##xM(AVTXContext *s, void *_out,                   \
+                                 void *_in, ptrdiff_t stride)                  \
+{                                                                              \
+    const int m = s->m, *in_map = s->pfatab, *out_map = in_map + N*m;          \
+    FFTComplex *in = _in;                                                      \
+    FFTComplex *out = _out;                                                    \
+    FFTComplex fft##N##in[N];                                                  \
+    void (*fftp)(FFTComplex *z) = fft_dispatch[av_log2(m) - 2];                \
+                                                                               \
+    for (int i = 0; i < m; i++) {                                              \
+        for (int j = 0; j < N; j++)                                            \
+            fft##N##in[j] = in[in_map[i*N + j]];                               \
+        fft##N(s->tmp + s->revtab[i], fft##N##in, m);                          \
+    }                                                                          \
+                                                                               \
+    for (int i = 0; i < N; i++)                                                \
+        fftp(s->tmp + m*i);                                                    \
+                                                                               \
+    for (int i = 0; i < N*m; i++)                                              \
+        out[i] = s->tmp[out_map[i]];                                           \
+}
+
+DECL_COMP_FFT(3)
+DECL_COMP_FFT(5)
+DECL_COMP_FFT(15)
+
+static void monolithic_fft(AVTXContext *s, void *_out, void *_in,
+                           ptrdiff_t stride)
+{
+    FFTComplex *in = _in;
+    FFTComplex *out = _out;
+    int m = s->m, mb = av_log2(m) - 2;
+    for (int i = 0; i < m; i++)
+        out[s->revtab[i]] = in[i];
+    fft_dispatch[mb](out);
+}
+
+#define DECL_COMP_IMDCT(N)                                                     \
+static void compound_imdct_##N##xM(AVTXContext *s, void *_dst, void *_src,     \
+                                   ptrdiff_t stride)                           \
+{                                                                              \
+    FFTComplex fft##N##in[N];                                                  \
+    FFTComplex *z = _dst, *exp = s->exptab;                                    \
+    const int m = s->m, len8 = N*m >> 1;                                       \
+    const int *in_map = s->pfatab, *out_map = in_map + N*m;                    \
+    const float *src = _src, *in1, *in2;                                       \
+    void (*fftp)(FFTComplex *) = fft_dispatch[av_log2(m) - 2];                 \
+                                                                               \
+    stride /= sizeof(*src); /* To convert it from bytes */                     \
+    in1 = src;                                                                 \
+    in2 = src + ((N*m*2) - 1) * stride;                                        \
+                                                                               \
+    for (int i = 0; i < m; i++) {                                              \
+        for (int j = 0; j < N; j++) {                                          \
+            const int k = in_map[i*N + j];                                     \
+            FFTComplex tmp = { in2[-k*stride], in1[k*stride] };                \
+            CMUL3(fft##N##in[j], tmp, exp[k >> 1]);                            \
+        }                                                                      \
+        fft##N(s->tmp + s->revtab[i], fft##N##in, m);                          \
+    }                                                                          \
+                                                                               \
+    for (int i = 0; i < N; i++)                                                \
+        fftp(s->tmp + m*i);                                                    \
+                                                                               \
+    for (int i = 0; i < len8; i++) {                                           \
+        const int i0 = len8 + i, i1 = len8 - i - 1;                            \
+        const int s0 = out_map[i0], s1 = out_map[i1];                          \
+        FFTComplex src1 = { s->tmp[s1].im, s->tmp[s1].re };                    \
+        FFTComplex src0 = { s->tmp[s0].im, s->tmp[s0].re };                    \
+                                                                               \
+        CMUL(z[i1].re, z[i0].im, src1.re, src1.im, exp[i1].im, exp[i1].re);    \
+        CMUL(z[i0].re, z[i1].im, src0.re, src0.im, exp[i0].im, exp[i0].re);    \
+    }                                                                          \
+}
+
+DECL_COMP_IMDCT(3)
+DECL_COMP_IMDCT(5)
+DECL_COMP_IMDCT(15)
+
+#define DECL_COMP_MDCT(N)                                                      \
+static void compound_mdct_##N##xM(AVTXContext *s, void *_dst, void *_src,      \
+                                  ptrdiff_t stride)                            \
+{                                                                              \
+    float *src = _src, *dst = _dst;                                            \
+    FFTComplex *exp = s->exptab, tmp, fft##N##in[N];                           \
+    const int m = s->m, len4 = N*m, len3 = len4 * 3, len8 = len4 >> 1;         \
+    const int *in_map = s->pfatab, *out_map = in_map + N*m;                    \
+    void (*fftp)(FFTComplex *) = fft_dispatch[av_log2(m) - 2];                 \
+                                                                               \
+    stride /= sizeof(*dst);                                                    \
+                                                                               \
+    for (int i = 0; i < m; i++) { /* Folding and pre-reindexing */             \
+        for (int j = 0; j < N; j++) {                                          \
+            const int k = in_map[i*N + j];                                     \
+            if (k < len4) {                                                    \
+                tmp.re = -src[ len4 + k] + src[1*len4 - 1 - k];                \
+                tmp.im = -src[ len3 + k] - src[1*len3 - 1 - k];                \
+            } else {                                                           \
+                tmp.re = -src[ len4 + k] - src[5*len4 - 1 - k];                \
+                tmp.im =  src[-len4 + k] - src[1*len3 - 1 - k];                \
+            }                                                                  \
+            CMUL(fft##N##in[j].im, fft##N##in[j].re, tmp.re, tmp.im,           \
+                 exp[k >> 1].re, exp[k >> 1].im);                              \
+        }                                                                      \
+        fft##N(s->tmp + s->revtab[i], fft##N##in, m);                          \
+    }                                                                          \
+                                                                               \
+    for (int i = 0; i < 15; i++)                                               \
+        fftp(s->tmp + m*i);                                                    \
+                                                                               \
+    for (int i = 0; i < len8; i++) {                                           \
+        const int i0 = len8 + i, i1 = len8 - i - 1;                            \
+        const int s0 = out_map[i0], s1 = out_map[i1];                          \
+        FFTComplex src1 = { s->tmp[s1].re, s->tmp[s1].im };                    \
+        FFTComplex src0 = { s->tmp[s0].re, s->tmp[s0].im };                    \
+                                                                               \
+        CMUL(dst[2*i1*stride + stride], dst[2*i0*stride], src0.re, src0.im,    \
+             exp[i0].im, exp[i0].re);                                          \
+        CMUL(dst[2*i0*stride + stride], dst[2*i1*stride], src1.re, src1.im,    \
+             exp[i1].im, exp[i1].re);                                          \
+    }                                                                          \
+}
+
+DECL_COMP_MDCT(3)
+DECL_COMP_MDCT(5)
+DECL_COMP_MDCT(15)
+
+static void monolithic_imdct(AVTXContext *s, void *_dst, void *_src,
+                             ptrdiff_t stride)
+{
+    FFTComplex *z = _dst, *exp = s->exptab;
+    const int m = s->m, len8 = m >> 1;
+    const float *src = _src, *in1, *in2;
+    void (*fftp)(FFTComplex *) = fft_dispatch[av_log2(m) - 2];
+
+    stride /= sizeof(*src);
+    in1 = src;
+    in2 = src + ((m*2) - 1) * stride;
+
+    for (int i = 0; i < m; i++) {
+        FFTComplex tmp = { in2[-2*i*stride], in1[2*i*stride] };
+        CMUL3(z[s->revtab[i]], tmp, exp[i]);
+    }
+
+    fftp(z);
+
+    for (int i = 0; i < len8; i++) {
+        const int i0 = len8 + i, i1 = len8 - i - 1;
+        FFTComplex src1 = { z[i1].im, z[i1].re };
+        FFTComplex src0 = { z[i0].im, z[i0].re };
+
+        CMUL(z[i1].re, z[i0].im, src1.re, src1.im, exp[i1].im, exp[i1].re);
+        CMUL(z[i0].re, z[i1].im, src0.re, src0.im, exp[i0].im, exp[i0].re);
+    }
+}
+
+static void monolithic_mdct(AVTXContext *s, void *_dst, void *_src,
+                            ptrdiff_t stride)
+{
+    float *src = _src, *dst = _dst;
+    FFTComplex *exp = s->exptab, tmp, *z = _dst;
+    const int m = s->m, len4 = m, len3 = len4 * 3, len8 = len4 >> 1;
+    void (*fftp)(FFTComplex *) = fft_dispatch[av_log2(m) - 2];
+
+    stride /= sizeof(*dst);
+
+    for (int i = 0; i < m; i++) { /* Folding and pre-reindexing */
+        const int k = 2*i;
+        if (k < len4) {
+            tmp.re = -src[ len4 + k] + src[1*len4 - 1 - k];
+            tmp.im = -src[ len3 + k] - src[1*len3 - 1 - k];
+        } else {
+            tmp.re = -src[ len4 + k] - src[5*len4 - 1 - k];
+            tmp.im =  src[-len4 + k] - src[1*len3 - 1 - k];
+        }
+        CMUL(z[s->revtab[i]].im, z[s->revtab[i]].re, tmp.re, tmp.im,
+             exp[i].re, exp[i].im);
+    }
+
+    fftp(z);
+
+    for (int i = 0; i < len8; i++) {
+        const int i0 = len8 + i, i1 = len8 - i - 1;
+        FFTComplex src1 = { z[i1].re, z[i1].im };
+        FFTComplex src0 = { z[i0].re, z[i0].im };
+
+        CMUL(dst[2*i1*stride + stride], dst[2*i0*stride], src0.re, src0.im,
+             exp[i0].im, exp[i0].re);
+        CMUL(dst[2*i0*stride + stride], dst[2*i1*stride], src1.re, src1.im,
+             exp[i1].im, exp[i1].re);
+    }
+}
+
+/* Calculates the modular multiplicative inverse, not fast, replace */
+static int mulinv(int n, int m)
+{
+    n = n % m;
+    for (int x = 1; x < m; x++)
+        if (((n * x) % m) == 1)
+            return x;
+    av_assert0(0); /* Never reached */
+}
+
+/* Guaranteed to work for any n, m where gcd(n, m) == 1 */
+static int gen_compound_mapping(AVTXContext *s, int n, int m, int inv,
+                                enum AVTXType type)
+{
+    int *in_map, *out_map;
+    const int len   = n*m;
+    const int m_inv = mulinv(m, n);
+    const int n_inv = mulinv(n, m);
+    const int mdct  = type == AV_TX_FLOAT_MDCT;
+
+    if (!(s->pfatab = av_malloc(2*len*sizeof(*s->pfatab))))
+        return AVERROR(ENOMEM);
+
+    in_map  = s->pfatab;
+    out_map = s->pfatab + n*m;
+
+    /* Ruritanian map for input, CRT map for output, can be swapped */
+    for (int j = 0; j < m; j++) {
+        for (int i = 0; i < n; i++) {
+            /* Shifted by 1 to simplify forward MDCTs */
+            in_map[j*n + i] = ((i*m + j*n) % len) << mdct;
+            out_map[(i*m*m_inv + j*n*n_inv) % len] = i*m + j;
+        }
+    }
+
+    /* Change transform direction by reversing all ACs */
+    if (inv) {
+        for (int i = 0; i < m; i++) {
+            int *in = &in_map[i*n + 1]; /* Skip the DC */
+            for (int j = 0; j < ((n - 1) >> 1); j++)
+                FFSWAP(int, in[j], in[n - j - 2]);
+        }
+    }
+
+    /* Our 15-point transform is also a compound one, so embed its input map */
+    if (n == 15) {
+        for (int k = 0; k < m; k++) {
+            int tmp[15];
+            memcpy(tmp, &in_map[k*15], 15*sizeof(*tmp));
+            for (int i = 0; i < 5; i++) {
+                for (int j = 0; j < 3; j++)
+                    in_map[k*15 + i*3 + j] = tmp[(i*3 + j*5) % 15];
+            }
+        }
+    }
+
+    return 0;
+}
+
+static int split_radix_permutation(int i, int n, int inverse)
+{
+    int m;
+    if (n <= 2)
+        return i & 1;
+    m = n >> 1;
+    if (!(i & m))
+        return split_radix_permutation(i, m, inverse)*2;
+    m >>= 1;
+    if (inverse == !(i & m))
+        return split_radix_permutation(i, m, inverse)*4 + 1;
+    else
+        return split_radix_permutation(i, m, inverse)*4 - 1;
+}
+
+static int get_ptwo_revtab(AVTXContext *s, int m, int inv)
+{
+    if (!(s->revtab = av_malloc(m*sizeof(*s->revtab))))
+        return AVERROR(ENOMEM);
+
+    /* Default */
+    for (int i = 0; i < m; i++) {
+        int k = -split_radix_permutation(i, m, inv) & (m - 1);
+        s->revtab[k] = i;
+    }
+
+    return 0;
+}
+
+static int gen_mdct_exptab(AVTXContext *s, int len4, double scale)
+{
+    const double theta = (scale < 0 ? len4 : 0) + 1.0/8.0;
+
+    if (!(s->exptab = av_malloc_array(len4, sizeof(*s->exptab))))
+        return AVERROR(ENOMEM);
+
+    scale = sqrt(fabs(scale));
+    for (int i = 0; i < len4; i++) {
+        const double alpha = M_PI_2 * (i + theta) / len4;
+        s->exptab[i].re = cos(alpha) * scale;
+        s->exptab[i].im = sin(alpha) * scale;
+    }
+
+    return 0;
+}
+
+av_cold void av_tx_uninit(AVTXContext **ctx)
+{
+    if (!ctx)
+        return;
+
+    av_free((*ctx)->pfatab);
+    av_free((*ctx)->exptab);
+    av_free((*ctx)->revtab);
+    av_free((*ctx)->tmp);
+
+    av_freep(ctx);
+}
+
+av_cold int av_tx_init(AVTXContext **ctx, av_tx_fn *tx, enum AVTXType type,
+                       int inv, int len, void *scale, uint64_t flags)
+{
+    int err, n = 1, m = 1, max_ptwo = 1 << (FF_ARRAY_ELEMS(fft_dispatch) + 1);
+    AVTXContext *s = av_mallocz(sizeof(*s));
+    if (!s)
+        return AVERROR(ENOMEM);
+
+    if (type >= AV_TX_NB)
+        return AVERROR(EINVAL);
+    else if (type == AV_TX_FLOAT_MDCT)
+        len >>= 1;
+
+#define CHECK_NPTWO_FACTOR(DST, FACTOR, SRC)                                   \
+    if (DST == 1 && !(SRC % FACTOR)) {                                         \
+        DST = FACTOR;                                                          \
+        SRC /= FACTOR;                                                         \
+    }
+    CHECK_NPTWO_FACTOR(n, 15, len)
+    CHECK_NPTWO_FACTOR(n,  5, len)
+    CHECK_NPTWO_FACTOR(n,  3, len)
+#undef CHECK_NPTWO_FACTOR
+
+    /* len must be a power of two now */
+    if (!(len & (len - 1)) && len >= 4 && len <= max_ptwo) {
+        m = len;
+        len = 1;
+    }
+
+    /* Filter out direct 3, 5 and 15 transforms, too niche */
+    if (len > 1 || m == 1) {
+        av_log(NULL, AV_LOG_ERROR, "Unsupported transform size: n = %i, "
+               "m = %i, residual = %i!\n", n, m, len);
+        err = AVERROR(EINVAL);
+        goto fail;
+    } else if (n > 1 && m > 1) { /* 2D transform case */
+        if ((err = gen_compound_mapping(s, n, m, inv, type)))
+            goto fail;
+        if (!(s->tmp = av_malloc(n*m*sizeof(*s->tmp)))) {
+            err = AVERROR(ENOMEM);
+            goto fail;
+        }
+        *tx = n == 3 ? compound_fft_3xM :
+              n == 5 ? compound_fft_5xM :
+                       compound_fft_15xM;
+        if (type == AV_TX_FLOAT_MDCT)
+            *tx = n == 3 ? inv ? compound_imdct_3xM  : compound_mdct_3xM :
+                  n == 5 ? inv ? compound_imdct_5xM  : compound_mdct_5xM :
+                           inv ? compound_imdct_15xM : compound_mdct_15xM;
+    } else { /* Direct transform case */
+        *tx = monolithic_fft;
+        if (type == AV_TX_FLOAT_MDCT)
+            *tx = inv ? monolithic_imdct : monolithic_mdct;
+    }
+
+    if (n != 1)
+        ff_thread_once(&tabs_53_once, ff_init_53_tabs);
+    if (m != 1) {
+        get_ptwo_revtab(s, m, inv);
+        for (int i = 4; i <= av_log2(m); i++)
+            ff_init_ff_cos_tabs(i);
+    }
+
+    if (type == AV_TX_FLOAT_MDCT)
+        if ((err = gen_mdct_exptab(s, n*m, *((float *)scale))))
+            goto fail;
+
+    s->n = n;
+    s->m = m;
+    *ctx = s;
+
+    return 0;
+
+fail:
+    av_tx_uninit(&s);
+    *tx = NULL;
+    return err;
+}
diff --git a/libavutil/tx.h b/libavutil/tx.h
new file mode 100644
index 0000000000..c142a8b54e
--- /dev/null
+++ b/libavutil/tx.h
@@ -0,0 +1,86 @@ 
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_TX_H
+#define AVUTIL_TX_H
+
+#include <stdint.h>
+#include "attributes.h"
+
+typedef struct AVTXContext AVTXContext;
+
+typedef struct AVComplexFloat {
+    float re, im;
+} AVComplexFloat;
+
+enum AVTXType {
+    /**
+     * Standard complex to complex FFT with sample data type AVComplexFloat.
+     * Scaling currently unsupported
+     */
+    AV_TX_FLOAT_FFT = 0,
+    /**
+     * Standard MDCT with sample data type of float and a scale type of
+     * float
+     */
+    AV_TX_FLOAT_MDCT = 1,
+
+    /**
+     * Not part of the API
+     */
+    AV_TX_NB,
+};
+
+/**
+ * Function pointer to a function to perform the transform.
+ *
+ * @note Using a different context than the one allocated during av_tx_init()
+ * is not allowed.
+ *
+ * @param s the transform context
+ * @param out the output array
+ * @param in the input array
+ * @param stride the input or output stride (depending on transform direction)
+ * in bytes, currently implemented for all MDCT transforms
+ */
+typedef void (*av_tx_fn)(AVTXContext *s, void *out, void *in, ptrdiff_t stride);
+
+/**
+ * Initialize a transform context with the given configuration
+ * Currently power of two lengths from 4 to 131072 are supported, along with
+ * any length decomposable to a power of two and either 3, 5 or 15.
+ *
+ * @param ctx the context to allocate, will be NULL on error
+ * @param tx pointer to the transform function pointer to set
+ * @param type type the type of transform
+ * @param inv whether to do an inverse or a forward transform
+ * @param len the size of the transform in samples
+ * @param scale pointer to the value to scale the output by
+ * @param flags currently unused
+ *
+ * @return 0 on success, negative error code on failure
+ */
+int av_tx_init(AVTXContext **ctx, av_tx_fn *tx, enum AVTXType type,
+               int inv, int len, void *scale, uint64_t flags);
+
+/**
+ * Frees a context and sets ctx to NULL, does nothing when ctx == NULL
+ */
+void av_tx_uninit(AVTXContext **ctx);
+
+#endif /* AVUTIL_TX_H */
-- 
2.20.1