[FFmpeg-devel,01/13] avcodec/diracdec: Clear MMX state after dirac_decode_frame_internal()

Submitted by Michael Niedermayer on Oct. 22, 2016, 7:01 p.m.

Details

Message ID 20161022190211.16526-1-michael@niedermayer.cc
State New
Headers show

Commit Message

Michael Niedermayer Oct. 22, 2016, 7:01 p.m.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
---
 libavcodec/diracdec.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ronald S. Bultje Oct. 23, 2016, 2:07 a.m.
Hi,

On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer <michael@niedermayer.cc
> wrote:

> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/vp56.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libavcodec/vp56.c b/libavcodec/vp56.c
> index 6319248..2125000 100644
> --- a/libavcodec/vp56.c
> +++ b/libavcodec/vp56.c
> @@ -668,6 +668,7 @@ static int ff_vp56_decode_mbs(AVCodecContext *avctx,
> void *data,
>      }
>
>  next:
> +    emms_c();


Why?

This suggests that the frame decoding completion callback I advocated
earlier might have to be a broader thing that also exists for non-frame-mt
cases?

Ronald
Ronald S. Bultje Oct. 23, 2016, 2:10 a.m.
Hi,

general comment about all other dec patches.

On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer <michael@niedermayer.cc
> wrote:

> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> ---
>  libavcodec/svq1dec.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> index 2b72e08..0fe222e 100644
> --- a/libavcodec/svq1dec.c
> +++ b/libavcodec/svq1dec.c
> @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext *avctx,
> void *data,
>              }
>          }
>      }
> +    emms_c();


This is hideous, you're sprinkling emms_c in various places to make some
stupid test pass. The test is morbidly stupid and there is no general
consensus on patterns to be followed as for where to place emms_c. Someone
who doesn't know any better will litter each new decoder with 10-20 calls
to emms_c just because he found that other decoders do it in undocumented,
unexplained and unclear locations also.

If you want this to be a "thing", you need to design and document carefully
where emms_c is necessary. Then come up with some system that makes this
work by itself. I've said from the beginning that I highly dislike
littering the code with emms_c in individual decoders, and that's exactly
what you're doing here. This is insane.

Ronald
Michael Niedermayer Oct. 23, 2016, 2:22 a.m.
On Sat, Oct 22, 2016 at 10:07:21PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> 
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavcodec/vp56.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/libavcodec/vp56.c b/libavcodec/vp56.c
> > index 6319248..2125000 100644
> > --- a/libavcodec/vp56.c
> > +++ b/libavcodec/vp56.c
> > @@ -668,6 +668,7 @@ static int ff_vp56_decode_mbs(AVCodecContext *avctx,
> > void *data,
> >      }
> >
> >  next:
> > +    emms_c();
> 
> 
> Why?

theres SIMD code above and av_frame_(un)ref() below
the frame ref stuff uses memory allocation and freeing

[...]
Michael Niedermayer Oct. 23, 2016, 3:36 a.m.
On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> general comment about all other dec patches.
> 
> On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer <michael@niedermayer.cc
> > wrote:
> 
> > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > ---
> >  libavcodec/svq1dec.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > index 2b72e08..0fe222e 100644
> > --- a/libavcodec/svq1dec.c
> > +++ b/libavcodec/svq1dec.c
> > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext *avctx,
> > void *data,
> >              }
> >          }
> >      }
> > +    emms_c();
> 
> 
> This is hideous, you're sprinkling emms_c in various places to make some
> stupid test pass. The test is morbidly stupid and there is no general
> consensus on patterns to be followed as for where to place emms_c. Someone
> who doesn't know any better will litter each new decoder with 10-20 calls
> to emms_c just because he found that other decoders do it in undocumented,
> unexplained and unclear locations also.
> 
> If you want this to be a "thing", you need to design and document carefully
> where emms_c is necessary. Then come up with some system that makes this
> work by itself.

Your mail sounds quite aggressive to me, did i say something to anger
you ? It was certainly not intended

About this patchset
FFmpeg is broken ATM (both in theory and practice with some libcs),
the way MMX code is used is not correct, emms_c()
calls are missing and required. The obvious way to fix that
(in practice) is adding emms_c() calls where they are missing.
I can document why each call is needed, if thats wanted?

I dont think hiding the calls by some trick would make it 
better. The calls must be carefully placed and knowing what the calls
do makes that easier to maintain than a frame_end() function which
would have to be ordered so it comes before any alloc/free/ref/unref/
... as well but for which thie requirement is less obvious
in fact even the emms in the thread progress is probably not that great
of an idea because of this.

[...]
Andreas Cadhalpun Oct. 23, 2016, 10:14 a.m.
Hi,

On 23.10.2016 04:10, Ronald S. Bultje wrote:
> This is hideous, you're sprinkling emms_c in various places to make some
> stupid test pass. The test is morbidly stupid

Can you elaborate why you think the test is bad?

> and there is no general
> consensus on patterns to be followed as for where to place emms_c. Someone
> who doesn't know any better will litter each new decoder with 10-20 calls
> to emms_c just because he found that other decoders do it in undocumented,
> unexplained and unclear locations also.
> 
> If you want this to be a "thing", you need to design and document carefully
> where emms_c is necessary. Then come up with some system that makes this
> work by itself. I've said from the beginning that I highly dislike
> littering the code with emms_c in individual decoders, and that's exactly
> what you're doing here. This is insane.

I also don't like adding emms_c in various places, but I don't see a realistic
alternative other than not fixing the problem. And I think that would be worse
than this patch set.

Best regards,
Andreas
Ronald S. Bultje Oct. 23, 2016, 5:02 p.m.
Hi,

On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
michael@niedermayer.cc> wrote:

> On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > Hi,
> >
> > general comment about all other dec patches.
> >
> > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> <michael@niedermayer.cc
> > > wrote:
> >
> > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > ---
> > >  libavcodec/svq1dec.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > index 2b72e08..0fe222e 100644
> > > --- a/libavcodec/svq1dec.c
> > > +++ b/libavcodec/svq1dec.c
> > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext
> *avctx,
> > > void *data,
> > >              }
> > >          }
> > >      }
> > > +    emms_c();
> >
> >
> > This is hideous, you're sprinkling emms_c in various places to make some
> > stupid test pass. The test is morbidly stupid and there is no general
> > consensus on patterns to be followed as for where to place emms_c.
> Someone
> > who doesn't know any better will litter each new decoder with 10-20 calls
> > to emms_c just because he found that other decoders do it in
> undocumented,
> > unexplained and unclear locations also.
> >
> > If you want this to be a "thing", you need to design and document
> carefully
> > where emms_c is necessary. Then come up with some system that makes this
> > work by itself.
>
> Your mail sounds quite aggressive to me, did i say something to anger
> you ? It was certainly not intended
>
> About this patchset
> FFmpeg is broken ATM (both in theory and practice with some libcs),
> the way MMX code is used is not correct, emms_c()
> calls are missing and required. The obvious way to fix that
> (in practice) is adding emms_c() calls where they are missing.
> I can document why each call is needed, if thats wanted?


Your representation of facts is strange, to say the least. Let's explore
two related claims:

- (A) ffmpeg is broken in practice when calling musl malloc/free functions
after calling MMX SIMD functions on x86-32 (which crashes).
- (B) ffmpeg is broken in theory because we don't clear FPU state (as
required) at the end of MMX SIMD functions.

Which are you trying to fix? You make it sound like you're fixing (B) (you
use the plural "libcs" and often use the word "required" or "standard"),
but your patchset only addresses a small subset of (A). For example, you're
not clearing FPU state at the end of a SIMD function if it's followed by a
SIMD function that calls memcpy in the C implementation.

But more importantly, in doing so (regardless whether you're trying to fix
"A" or "B"), this patchset:
- has no design (or at least, I can't find it);
- it litters the code with unstructured calls to emms_c(), which seem
specifically placed to fix make fate with an assert added in av_malloc/free;
- it adds unnecessary (to fix "A") calls to emms_c() on x86-64;
- it doesn't address pretty much anything in "B";
- it does not address any scenario not tested by make fate (e.g. external
lib encoders/decoders/filters).

I'm mostly asking about design here. "Litter around emms_c() calls until
make fate passes with an assert added" is not a good design, as much as it
may decrease the number of av_assert2_fpu()s triggered by "make fate". Do
you think we can do better?

You're calling me aggressive, so to address that, let me try to be
substantive and give suggestions on how this could be done better. Please
note that this list is incomplete and is meant to start a discussion on
this subject, it's not like "if you address these specific issues, I'm OK
with the patchset". I'm happy to think more about this and give more
suggestions if you want to hear them. My suggestions:
- define whether we're trying to fix "A" or "B" from above, because it
affects the design. I'm assuming you're trying to fix A.
- #define emms_c() do { /* nothing */ } while (0) on x86-64;
- add a call to emms_c() in frame allocation routines (this should prevent
the emms_c() insertions in vp56.c and svq1dec.c);
- add a routine to signal that frame processing is complete (like the
INT_MAX check you're trying to add with frame-threading), and call that in
all decoders/encoders after block processing (which calls SIMD) is
complete, so we can context-switch (and emms_c()) regardless of the
presence of frame-mt or not. It also gets rid of the magic value INT_MAX;
- consider potential issues with external lib en/decoders which don't have
much of a presence in "make fate";
- think about libavfilter.

I'm fully aware that this is not a complete design that gets rid of calls
to emms_c() in all decoders. I'm also fully aware that some decoders
already call emms_c() because they signal when one line of blocks is
completed, which is a user-callback (which may need a cleared FPU state).
I'm not asking you to get rid of any of that, I'm just asking you to not
make it exponentially worse by designing this a little bit more carefully.

Thank you,
Ronald
Carl Eugen Hoyos Oct. 23, 2016, 6:17 p.m.
2016-10-23 19:02 GMT+02:00 Ronald S. Bultje <rsbultje@gmail.com>:

> - (A) ffmpeg is broken in practice when calling musl malloc/free
> functions after calling MMX SIMD functions on x86-32 (which
> crashes).
> - (B) ffmpeg is broken in theory because we don't clear FPU
> state (as required) at the end of MMX SIMD functions.

> - define whether we're trying to fix "A" or "B" from above, because
> it affects the design. I'm assuming you're trying to fix A.

The posted patches make the impression to me that Michael tries
to fix B.

Carl Eugen
Carl Eugen Hoyos Oct. 23, 2016, 6:18 p.m.
2016-10-23 12:14 GMT+02:00 Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>:

> I also don't like adding emms_c in various places, but I don't see a
> realistic alternative

(+1!)

> other than not fixing the problem. And I think that would be worse
> than this patch set.

I don't (strongly) disagree but could you elaborate on why this would
be such a bad alternative?
(As always assuming that development time is the only limiting
factor.)

Carl Eugen
Michael Niedermayer Oct. 23, 2016, 8:33 p.m.
On Sun, Oct 23, 2016 at 01:02:01PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> michael@niedermayer.cc> wrote:
> 
> > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > > Hi,
> > >
> > > general comment about all other dec patches.
> > >
> > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> > <michael@niedermayer.cc
> > > > wrote:
> > >
> > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > ---
> > > >  libavcodec/svq1dec.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > index 2b72e08..0fe222e 100644
> > > > --- a/libavcodec/svq1dec.c
> > > > +++ b/libavcodec/svq1dec.c
> > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext
> > *avctx,
> > > > void *data,
> > > >              }
> > > >          }
> > > >      }
> > > > +    emms_c();
> > >
> > >
> > > This is hideous, you're sprinkling emms_c in various places to make some
> > > stupid test pass. The test is morbidly stupid and there is no general
> > > consensus on patterns to be followed as for where to place emms_c.
> > Someone
> > > who doesn't know any better will litter each new decoder with 10-20 calls
> > > to emms_c just because he found that other decoders do it in
> > undocumented,
> > > unexplained and unclear locations also.
> > >
> > > If you want this to be a "thing", you need to design and document
> > carefully
> > > where emms_c is necessary. Then come up with some system that makes this
> > > work by itself.
> >
> > Your mail sounds quite aggressive to me, did i say something to anger
> > you ? It was certainly not intended
> >
> > About this patchset
> > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > the way MMX code is used is not correct, emms_c()
> > calls are missing and required. The obvious way to fix that
> > (in practice) is adding emms_c() calls where they are missing.
> > I can document why each call is needed, if thats wanted?
> 
> 
> Your representation of facts is strange, to say the least. Let's explore
> two related claims:
> 
> - (A) ffmpeg is broken in practice when calling musl malloc/free functions
> after calling MMX SIMD functions on x86-32 (which crashes).
> - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> required) at the end of MMX SIMD functions.
> 
> Which are you trying to fix? You make it sound like you're fixing (B) (you
> use the plural "libcs" and often use the word "required" or "standard"),

All the crashes with musl are to the best of my knowledge FFmpeg
violating the C standard.
Iam interrested in fixing the cases where we violate the C standard,
with a priority toward the cases that cause practical issues.

musl really isnt involved in this, any C library can trigger these bugs
in FFmpeg hence plural.



> but your patchset only addresses a small subset of (A).

I just tried and fate fully passes on x86_32 with this patchset as
well as x86_64 both with the assert enabled
prior to this patchset there are 310 fate failures on x86_32.
The assert should check the case relevant for the musl case.
If something remains iam interrested in fixing it


> For example, you're
> not clearing FPU state at the end of a SIMD function if it's followed by a
> SIMD function that calls memcpy in the C implementation.

yes, my priority is at fixing the fpu state when calling memory
allocation functions from libc, because this class of issues is what
caused the crashes in practice


> 
> But more importantly, in doing so (regardless whether you're trying to fix
> "A" or "B"), this patchset:
> - has no design (or at least, I can't find it);
> - it litters the code with unstructured calls to emms_c(), which seem
> specifically placed to fix make fate with an assert added in av_malloc/free;

Isnt this the same as with any memleak fix adding a *free(), littering
the code with said *free() like emms_c() litters the code when it is
needed?


> - it adds unnecessary (to fix "A") calls to emms_c() on x86-64;

The C standard mandates this, so it is not unnecessary. Also this
class of issues caused crashes with musl in practice.
It may be that some patch is not needed for fixing a practical issue
and only theoretical standard compliance but i doubt that also
even if why does this matter?
We want to fix this, i have certainly better to do than to test if
undefined behavor actually triggers a crash
we have never left undefined behavor intentionally in the codebase
just because we dont have a actual testcase that fails. (or at least
i hope we never did)
Instead quite the opposite we use heaps of ugly macros to do
unaligned and aliassing violating memory accesses to avoid undefined
behavior



> - it doesn't address pretty much anything in "B";

> - it does not address any scenario not tested by make fate (e.g. external
> lib encoders/decoders/filters).
> 
> I'm mostly asking about design here. "Litter around emms_c() calls until
> make fate passes with an assert added" is not a good design, as much as it
> may decrease the number of av_assert2_fpu()s triggered by "make fate". Do
> you think we can do better?

I dont know if we can, but I know that I cannot


> 
> You're calling me aggressive, so to address that, let me try to be
> substantive and give suggestions on how this could be done better. Please
> note that this list is incomplete and is meant to start a discussion on
> this subject, it's not like "if you address these specific issues, I'm OK
> with the patchset". I'm happy to think more about this and give more
> suggestions if you want to hear them. My suggestions:
> - define whether we're trying to fix "A" or "B" from above, because it
> affects the design. I'm assuming you're trying to fix A.

> - #define emms_c() do { /* nothing */ } while (0) on x86-64;

emms_c is needed on x86-64 too


> - add a call to emms_c() in frame allocation routines (this should prevent
> the emms_c() insertions in vp56.c and svq1dec.c);

I tried this before submiting the patchset but it didnt fix much and
it has potential performance implications as this allocation is used
by audio codecs too. (many small frames) Also for this to work you must
do the same to teh deallocation / unref functions too which still fixes
only around something around 10-15 or so fate failures out of the 70
that remained when i tried this.
And with checks and special cases for audio this certainly has nothing
to do with being a clean solution.


> - add a routine to signal that frame processing is complete (like the
> INT_MAX check you're trying to add with frame-threading), and call that in
> all decoders/encoders after block processing (which calls SIMD) is
> complete, so we can context-switch (and emms_c()) regardless of the
> presence of frame-mt or not. It also gets rid of the magic value INT_MAX;
> - consider potential issues with external lib en/decoders which don't have
> much of a presence in "make fate";

> - think about libavfilter.

libavfilter is tested in make fate the same as anything else

Thanks

PS: i am happy to leave this to someone else if thats preferred.

[...]
Ronald S. Bultje Oct. 23, 2016, 9:44 p.m.
Hi,

On Sun, Oct 23, 2016 at 4:33 PM, Michael Niedermayer <michael@niedermayer.cc
> wrote:

> On Sun, Oct 23, 2016 at 01:02:01PM -0400, Ronald S. Bultje wrote:
> > Hi,
> >
> > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> > michael@niedermayer.cc> wrote:
> >
> > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > > > Hi,
> > > >
> > > > general comment about all other dec patches.
> > > >
> > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> > > <michael@niedermayer.cc
> > > > > wrote:
> > > >
> > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > > ---
> > > > >  libavcodec/svq1dec.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > index 2b72e08..0fe222e 100644
> > > > > --- a/libavcodec/svq1dec.c
> > > > > +++ b/libavcodec/svq1dec.c
> > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext
> > > *avctx,
> > > > > void *data,
> > > > >              }
> > > > >          }
> > > > >      }
> > > > > +    emms_c();
> > > >
> > > >
> > > > This is hideous, you're sprinkling emms_c in various places to make
> some
> > > > stupid test pass. The test is morbidly stupid and there is no general
> > > > consensus on patterns to be followed as for where to place emms_c.
> > > Someone
> > > > who doesn't know any better will litter each new decoder with 10-20
> calls
> > > > to emms_c just because he found that other decoders do it in
> > > undocumented,
> > > > unexplained and unclear locations also.
> > > >
> > > > If you want this to be a "thing", you need to design and document
> > > carefully
> > > > where emms_c is necessary. Then come up with some system that makes
> this
> > > > work by itself.
> > >
> > > Your mail sounds quite aggressive to me, did i say something to anger
> > > you ? It was certainly not intended
> > >
> > > About this patchset
> > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > the way MMX code is used is not correct, emms_c()
> > > calls are missing and required. The obvious way to fix that
> > > (in practice) is adding emms_c() calls where they are missing.
> > > I can document why each call is needed, if thats wanted?
> >
> >
> > Your representation of facts is strange, to say the least. Let's explore
> > two related claims:
> >
> > - (A) ffmpeg is broken in practice when calling musl malloc/free
> functions
> > after calling MMX SIMD functions on x86-32 (which crashes).
> > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > required) at the end of MMX SIMD functions.
> >
> > Which are you trying to fix? You make it sound like you're fixing (B)
> (you
> > use the plural "libcs" and often use the word "required" or "standard"),
>
> All the crashes with musl are to the best of my knowledge FFmpeg
> violating the C standard.
> Iam interrested in fixing the cases where we violate the C standard,
> with a priority toward the cases that cause practical issues.
>
> musl really isnt involved in this, any C library can trigger these bugs
> in FFmpeg hence plural.


You're not answering the question. Are you fixing A or B? If you're trying
to say that your patch is aiming to fix B, then your patch is utterly and
totally inadequate and you know that.

Any SIMD function calling memcpy (a libc function) needs to have its FPU
state cleared when called, if you're claiming to fix B.

As such, your patch should be rejected on the same basis why the cabac
encoder should never have been accepted.

Ronald
wm4 Oct. 24, 2016, 7:36 a.m.
On Sun, 23 Oct 2016 13:02:01 -0400
"Ronald S. Bultje" <rsbultje@gmail.com> wrote:

> Hi,
> 
> On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> michael@niedermayer.cc> wrote:  
> 
> > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:  
> > > Hi,
> > >
> > > general comment about all other dec patches.
> > >
> > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  
> > <michael@niedermayer.cc  
> > > > wrote:  
> > >  
> > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > ---
> > > >  libavcodec/svq1dec.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > index 2b72e08..0fe222e 100644
> > > > --- a/libavcodec/svq1dec.c
> > > > +++ b/libavcodec/svq1dec.c
> > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext  
> > *avctx,  
> > > > void *data,
> > > >              }
> > > >          }
> > > >      }
> > > > +    emms_c();  
> > >
> > >
> > > This is hideous, you're sprinkling emms_c in various places to make some
> > > stupid test pass. The test is morbidly stupid and there is no general
> > > consensus on patterns to be followed as for where to place emms_c.  
> > Someone  
> > > who doesn't know any better will litter each new decoder with 10-20 calls
> > > to emms_c just because he found that other decoders do it in  
> > undocumented,  
> > > unexplained and unclear locations also.
> > >
> > > If you want this to be a "thing", you need to design and document  
> > carefully  
> > > where emms_c is necessary. Then come up with some system that makes this
> > > work by itself.  
> >
> > Your mail sounds quite aggressive to me, did i say something to anger
> > you ? It was certainly not intended
> >
> > About this patchset
> > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > the way MMX code is used is not correct, emms_c()
> > calls are missing and required. The obvious way to fix that
> > (in practice) is adding emms_c() calls where they are missing.
> > I can document why each call is needed, if thats wanted?  
> 
> 
> Your representation of facts is strange, to say the least. Let's explore
> two related claims:
> 
> - (A) ffmpeg is broken in practice when calling musl malloc/free functions
> after calling MMX SIMD functions on x86-32 (which crashes).
> - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> required) at the end of MMX SIMD functions.

I was under the impression that it is UB to have the FPU in MMX state
at any time while in C, not just while e.g. calling the stdlib. Maybe I
got that wrong (how would MMX intrinsics even work?) - can anyone shed
light on the exact requirements? (Possibly again, sorry.)
Ronald S. Bultje Oct. 24, 2016, 11:54 a.m.
Hi,

On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:

> On Sun, 23 Oct 2016 13:02:01 -0400
> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
>
> > Hi,
> >
> > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> > michael@niedermayer.cc> wrote:
> >
> > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > > > Hi,
> > > >
> > > > general comment about all other dec patches.
> > > >
> > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> > > <michael@niedermayer.cc
> > > > > wrote:
> > > >
> > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > > ---
> > > > >  libavcodec/svq1dec.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > index 2b72e08..0fe222e 100644
> > > > > --- a/libavcodec/svq1dec.c
> > > > > +++ b/libavcodec/svq1dec.c
> > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext
> > > *avctx,
> > > > > void *data,
> > > > >              }
> > > > >          }
> > > > >      }
> > > > > +    emms_c();
> > > >
> > > >
> > > > This is hideous, you're sprinkling emms_c in various places to make
> some
> > > > stupid test pass. The test is morbidly stupid and there is no general
> > > > consensus on patterns to be followed as for where to place emms_c.
> > > Someone
> > > > who doesn't know any better will litter each new decoder with 10-20
> calls
> > > > to emms_c just because he found that other decoders do it in
> > > undocumented,
> > > > unexplained and unclear locations also.
> > > >
> > > > If you want this to be a "thing", you need to design and document
> > > carefully
> > > > where emms_c is necessary. Then come up with some system that makes
> this
> > > > work by itself.
> > >
> > > Your mail sounds quite aggressive to me, did i say something to anger
> > > you ? It was certainly not intended
> > >
> > > About this patchset
> > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > the way MMX code is used is not correct, emms_c()
> > > calls are missing and required. The obvious way to fix that
> > > (in practice) is adding emms_c() calls where they are missing.
> > > I can document why each call is needed, if thats wanted?
> >
> >
> > Your representation of facts is strange, to say the least. Let's explore
> > two related claims:
> >
> > - (A) ffmpeg is broken in practice when calling musl malloc/free
> functions
> > after calling MMX SIMD functions on x86-32 (which crashes).
> > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > required) at the end of MMX SIMD functions.
>
> I was under the impression that it is UB to have the FPU in MMX state
> at any time while in C, not just while e.g. calling the stdlib. Maybe I
> got that wrong (how would MMX intrinsics even work?) - can anyone shed
> light on the exact requirements? (Possibly again, sorry.)


I'm under the impression that it's part of the calling convention. That is,
any code anywhere (including mmx intrinsics, indeed) can - when called -
expect the state to be cleared by the caller, just like you'd expect
eax/edx to be caller-save (whereas esi/edi are callee-save).

However, if you never call external code (including intrinsics), you can
ignore this, just as you can ignore / create your own calling
convention (remember fastcall etc.?). However, when calling any external
code, this could (in theory) crash; it's just that right now it only
crashes with musl when calling malloc/free. So basically, ffmpeg has its
own calling convention, and manually calling emms_c() fixes "ffmpeg"
calling convention to be compatible with "standard" calling convention...?

Ronald
wm4 Oct. 24, 2016, 12:47 p.m.
On Mon, 24 Oct 2016 07:54:47 -0400
"Ronald S. Bultje" <rsbultje@gmail.com> wrote:

> Hi,
> 
> On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:
> 
> > On Sun, 23 Oct 2016 13:02:01 -0400
> > "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
> >  
> > > Hi,
> > >
> > > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <  
> > > michael@niedermayer.cc> wrote:  
> > >  
> > > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:  
> > > > > Hi,
> > > > >
> > > > > general comment about all other dec patches.
> > > > >
> > > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer  
> > > > <michael@niedermayer.cc  
> > > > > > wrote:  
> > > > >  
> > > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > > > ---
> > > > > >  libavcodec/svq1dec.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > > index 2b72e08..0fe222e 100644
> > > > > > --- a/libavcodec/svq1dec.c
> > > > > > +++ b/libavcodec/svq1dec.c
> > > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecContext  
> > > > *avctx,  
> > > > > > void *data,
> > > > > >              }
> > > > > >          }
> > > > > >      }
> > > > > > +    emms_c();  
> > > > >
> > > > >
> > > > > This is hideous, you're sprinkling emms_c in various places to make  
> > some  
> > > > > stupid test pass. The test is morbidly stupid and there is no general
> > > > > consensus on patterns to be followed as for where to place emms_c.  
> > > > Someone  
> > > > > who doesn't know any better will litter each new decoder with 10-20  
> > calls  
> > > > > to emms_c just because he found that other decoders do it in  
> > > > undocumented,  
> > > > > unexplained and unclear locations also.
> > > > >
> > > > > If you want this to be a "thing", you need to design and document  
> > > > carefully  
> > > > > where emms_c is necessary. Then come up with some system that makes  
> > this  
> > > > > work by itself.  
> > > >
> > > > Your mail sounds quite aggressive to me, did i say something to anger
> > > > you ? It was certainly not intended
> > > >
> > > > About this patchset
> > > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > > the way MMX code is used is not correct, emms_c()
> > > > calls are missing and required. The obvious way to fix that
> > > > (in practice) is adding emms_c() calls where they are missing.
> > > > I can document why each call is needed, if thats wanted?  
> > >
> > >
> > > Your representation of facts is strange, to say the least. Let's explore
> > > two related claims:
> > >
> > > - (A) ffmpeg is broken in practice when calling musl malloc/free  
> > functions  
> > > after calling MMX SIMD functions on x86-32 (which crashes).
> > > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > > required) at the end of MMX SIMD functions.  
> >
> > I was under the impression that it is UB to have the FPU in MMX state
> > at any time while in C, not just while e.g. calling the stdlib. Maybe I
> > got that wrong (how would MMX intrinsics even work?) - can anyone shed
> > light on the exact requirements? (Possibly again, sorry.)  
> 
> 
> I'm under the impression that it's part of the calling convention. That is,
> any code anywhere (including mmx intrinsics, indeed) can - when called -
> expect the state to be cleared by the caller, just like you'd expect
> eax/edx to be caller-save (whereas esi/edi are callee-save).
> 
> However, if you never call external code (including intrinsics), you can
> ignore this, just as you can ignore / create your own calling
> convention (remember fastcall etc.?). However, when calling any external
> code, this could (in theory) crash; it's just that right now it only
> crashes with musl when calling malloc/free. So basically, ffmpeg has its
> own calling convention, and manually calling emms_c() fixes "ffmpeg"
> calling convention to be compatible with "standard" calling convention...?

It can't really be a calling convention unless the compiler is aware of
it?

We're doing things behind the compiler's back here. The safest
assumption would be that leaving the FPU state invalid while in C is
not allowed, period.

The next safest assumption is that it's fine as long as we explicitly
don't use floating point or call external functions that aren't
MMX-aware. This would mean calling any libc functions or user callbacks
(including indirectly through e.g. av_log) is not fine.

Of course this is not as easy to verify as adding a hack-assert to
av_malloc/av_free.
Ronald S. Bultje Oct. 24, 2016, 2:14 p.m.
Hi,

On Mon, Oct 24, 2016 at 8:47 AM, wm4 <nfxjfg@googlemail.com> wrote:

> On Mon, 24 Oct 2016 07:54:47 -0400
> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
>
> > Hi,
> >
> > On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:
> >
> > > On Sun, 23 Oct 2016 13:02:01 -0400
> > > "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > On Sat, Oct 22, 2016 at 11:36 PM, Michael Niedermayer <
> > > > michael@niedermayer.cc> wrote:
> > > >
> > > > > On Sat, Oct 22, 2016 at 10:10:01PM -0400, Ronald S. Bultje wrote:
> > > > > > Hi,
> > > > > >
> > > > > > general comment about all other dec patches.
> > > > > >
> > > > > > On Sat, Oct 22, 2016 at 3:02 PM, Michael Niedermayer
> > > > > <michael@niedermayer.cc
> > > > > > > wrote:
> > > > > >
> > > > > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
> > > > > > > ---
> > > > > > >  libavcodec/svq1dec.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/libavcodec/svq1dec.c b/libavcodec/svq1dec.c
> > > > > > > index 2b72e08..0fe222e 100644
> > > > > > > --- a/libavcodec/svq1dec.c
> > > > > > > +++ b/libavcodec/svq1dec.c
> > > > > > > @@ -744,6 +744,7 @@ static int svq1_decode_frame(AVCodecConte
> xt
> > > > > *avctx,
> > > > > > > void *data,
> > > > > > >              }
> > > > > > >          }
> > > > > > >      }
> > > > > > > +    emms_c();
> > > > > >
> > > > > >
> > > > > > This is hideous, you're sprinkling emms_c in various places to
> make
> > > some
> > > > > > stupid test pass. The test is morbidly stupid and there is no
> general
> > > > > > consensus on patterns to be followed as for where to place
> emms_c.
> > > > > Someone
> > > > > > who doesn't know any better will litter each new decoder with
> 10-20
> > > calls
> > > > > > to emms_c just because he found that other decoders do it in
> > > > > undocumented,
> > > > > > unexplained and unclear locations also.
> > > > > >
> > > > > > If you want this to be a "thing", you need to design and document
> > > > > carefully
> > > > > > where emms_c is necessary. Then come up with some system that
> makes
> > > this
> > > > > > work by itself.
> > > > >
> > > > > Your mail sounds quite aggressive to me, did i say something to
> anger
> > > > > you ? It was certainly not intended
> > > > >
> > > > > About this patchset
> > > > > FFmpeg is broken ATM (both in theory and practice with some libcs),
> > > > > the way MMX code is used is not correct, emms_c()
> > > > > calls are missing and required. The obvious way to fix that
> > > > > (in practice) is adding emms_c() calls where they are missing.
> > > > > I can document why each call is needed, if thats wanted?
> > > >
> > > >
> > > > Your representation of facts is strange, to say the least. Let's
> explore
> > > > two related claims:
> > > >
> > > > - (A) ffmpeg is broken in practice when calling musl malloc/free
> > > functions
> > > > after calling MMX SIMD functions on x86-32 (which crashes).
> > > > - (B) ffmpeg is broken in theory because we don't clear FPU state (as
> > > > required) at the end of MMX SIMD functions.
> > >
> > > I was under the impression that it is UB to have the FPU in MMX state
> > > at any time while in C, not just while e.g. calling the stdlib. Maybe I
> > > got that wrong (how would MMX intrinsics even work?) - can anyone shed
> > > light on the exact requirements? (Possibly again, sorry.)
> >
> >
> > I'm under the impression that it's part of the calling convention. That
> is,
> > any code anywhere (including mmx intrinsics, indeed) can - when called -
> > expect the state to be cleared by the caller, just like you'd expect
> > eax/edx to be caller-save (whereas esi/edi are callee-save).
> >
> > However, if you never call external code (including intrinsics), you can
> > ignore this, just as you can ignore / create your own calling
> > convention (remember fastcall etc.?). However, when calling any external
> > code, this could (in theory) crash; it's just that right now it only
> > crashes with musl when calling malloc/free. So basically, ffmpeg has its
> > own calling convention, and manually calling emms_c() fixes "ffmpeg"
> > calling convention to be compatible with "standard" calling
> convention...?
>
> It can't really be a calling convention unless the compiler is aware of
> it?
>
> We're doing things behind the compiler's back here. The safest
> assumption would be that leaving the FPU state invalid while in C is
> not allowed, period.
>
> The next safest assumption is that it's fine as long as we explicitly
> don't use floating point or call external functions that aren't
> MMX-aware. This would mean calling any libc functions or user callbacks
> (including indirectly through e.g. av_log) is not fine.
>

Yes.

And while I don't agree with it, I feel we should also mention Carl Eugen's
alternative proposal, which is to document that ffmpeg doesn't formally
adhere to this calling convention requirement [1].

Of course this is not as easy to verify as adding a hack-assert to
> av_malloc/av_free.


I agree.

(Which is why I insist that the design of this particular "fix" is so
important.)

Ronald

[1] https://patchwork.ffmpeg.org/patch/841/
Andreas Cadhalpun Oct. 24, 2016, 7:11 p.m.
On 23.10.2016 20:18, Carl Eugen Hoyos wrote:
> 2016-10-23 12:14 GMT+02:00 Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>:
> 
>> I also don't like adding emms_c in various places, but I don't see a
>> realistic alternative
> 
> (+1!)
> 
>> other than not fixing the problem. And I think that would be worse
>> than this patch set.
> 
> I don't (strongly) disagree but could you elaborate on why this would
> be such a bad alternative?

For one thing FFmpeg shouldn't rely on undefined behavior (a theoretical
issue) and for another thing it should work with musl libc (a practical
problem). I think not fixing the latter is bad for our users.

Best regards,
Andreas
Andreas Cadhalpun Oct. 24, 2016, 7:19 p.m.
On 24.10.2016 16:14, Ronald S. Bultje wrote:
> On Mon, Oct 24, 2016 at 8:47 AM, wm4 <nfxjfg@googlemail.com> wrote:
>> On Mon, 24 Oct 2016 07:54:47 -0400
>> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
>>> On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:
>>>> I was under the impression that it is UB to have the FPU in MMX state
>>>> at any time while in C, not just while e.g. calling the stdlib. Maybe I
>>>> got that wrong (how would MMX intrinsics even work?) - can anyone shed
>>>> light on the exact requirements? (Possibly again, sorry.)
>>>
>>> I'm under the impression that it's part of the calling convention. That
>> is,
>>> any code anywhere (including mmx intrinsics, indeed) can - when called -
>>> expect the state to be cleared by the caller, just like you'd expect
>>> eax/edx to be caller-save (whereas esi/edi are callee-save).
>>>
>>> However, if you never call external code (including intrinsics), you can
>>> ignore this, just as you can ignore / create your own calling
>>> convention (remember fastcall etc.?). However, when calling any external
>>> code, this could (in theory) crash; it's just that right now it only
>>> crashes with musl when calling malloc/free. So basically, ffmpeg has its
>>> own calling convention, and manually calling emms_c() fixes "ffmpeg"
>>> calling convention to be compatible with "standard" calling
>> convention...?
>>
>> It can't really be a calling convention unless the compiler is aware of
>> it?

It is defined as part of the System V Application Binary Interface [1]:
"The CPU shall be in x87 mode upon entry to a function. Therefore, every
function that uses the MMX registers is required to issue an emms or femms
instruction after using MMX registers, before returning or calling another
function."

>> We're doing things behind the compiler's back here. The safest
>> assumption would be that leaving the FPU state invalid while in C is
>> not allowed, period.

That's what the standard says.

>> The next safest assumption is that it's fine as long as we explicitly
>> don't use floating point or call external functions that aren't
>> MMX-aware. This would mean calling any libc functions or user callbacks
>> (including indirectly through e.g. av_log) is not fine.

This is probably OK in practice and likely has a significant performance
benefit.

Best regards,
Andreas


1: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
wm4 Oct. 24, 2016, 7:34 p.m.
On Mon, 24 Oct 2016 21:19:46 +0200
Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> wrote:

> On 24.10.2016 16:14, Ronald S. Bultje wrote:
> > On Mon, Oct 24, 2016 at 8:47 AM, wm4 <nfxjfg@googlemail.com> wrote:  
> >> On Mon, 24 Oct 2016 07:54:47 -0400
> >> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:  
> >>> On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:  
> >>>> I was under the impression that it is UB to have the FPU in MMX state
> >>>> at any time while in C, not just while e.g. calling the stdlib. Maybe I
> >>>> got that wrong (how would MMX intrinsics even work?) - can anyone shed
> >>>> light on the exact requirements? (Possibly again, sorry.)  
> >>>
> >>> I'm under the impression that it's part of the calling convention. That  
> >> is,  
> >>> any code anywhere (including mmx intrinsics, indeed) can - when called -
> >>> expect the state to be cleared by the caller, just like you'd expect
> >>> eax/edx to be caller-save (whereas esi/edi are callee-save).
> >>>
> >>> However, if you never call external code (including intrinsics), you can
> >>> ignore this, just as you can ignore / create your own calling
> >>> convention (remember fastcall etc.?). However, when calling any external
> >>> code, this could (in theory) crash; it's just that right now it only
> >>> crashes with musl when calling malloc/free. So basically, ffmpeg has its
> >>> own calling convention, and manually calling emms_c() fixes "ffmpeg"
> >>> calling convention to be compatible with "standard" calling  
> >> convention...?
> >>
> >> It can't really be a calling convention unless the compiler is aware of
> >> it?  
> 
> It is defined as part of the System V Application Binary Interface [1]:
> "The CPU shall be in x87 mode upon entry to a function. Therefore, every
> function that uses the MMX registers is required to issue an emms or femms
> instruction after using MMX registers, before returning or calling another
> function."

I mean FFmpeg can't make up its own calling convention without the
compiler's knowledge.

But thanks for reminding me about this but of the sysv ABI. The
paragraph you quoted is actually very clear about the requirements. It
means FFmpeg can barely do anything and remain standard compliant: a
ASM function must, according to the calling convention, reset the MMX
state when returning.

What FFmpeg does here was misdesigned from the very start.

> >> We're doing things behind the compiler's back here. The safest
> >> assumption would be that leaving the FPU state invalid while in C is
> >> not allowed, period.  
> 
> That's what the standard says.
> 
> >> The next safest assumption is that it's fine as long as we explicitly
> >> don't use floating point or call external functions that aren't
> >> MMX-aware. This would mean calling any libc functions or user callbacks
> >> (including indirectly through e.g. av_log) is not fine.  
> 
> This is probably OK in practice and likely has a significant performance
> benefit.

Seems like it.

The compiler still could generate code using the FPU state at any
point, though, unless maybe there is an inline asm block. No function
can put the FPU into MMX mode, because any MMX using function must have
called emms before returning. Consequently, only compiler-known
intrinsics or opaque inline asm block could clobber the FPU state. The
latter because the compiler doesn't inspect asm blocks.

> 
> Best regards,
> Andreas
> 
> 
> 1: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
Andreas Cadhalpun Oct. 24, 2016, 7:47 p.m.
On 24.10.2016 21:34, wm4 wrote:
> On Mon, 24 Oct 2016 21:19:46 +0200
> Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> wrote:
>> On 24.10.2016 16:14, Ronald S. Bultje wrote:
>>> On Mon, Oct 24, 2016 at 8:47 AM, wm4 <nfxjfg@googlemail.com> wrote:  
>>>> The next safest assumption is that it's fine as long as we explicitly
>>>> don't use floating point or call external functions that aren't
>>>> MMX-aware. This would mean calling any libc functions or user callbacks
>>>> (including indirectly through e.g. av_log) is not fine.  
>>
>> This is probably OK in practice and likely has a significant performance
>> benefit.
> 
> Seems like it.
> 
> The compiler still could generate code using the FPU state at any
> point, though, unless maybe there is an inline asm block. No function
> can put the FPU into MMX mode, because any MMX using function must have
> called emms before returning. Consequently, only compiler-known
> intrinsics or opaque inline asm block could clobber the FPU state. The
> latter because the compiler doesn't inspect asm blocks.

I agree. However, I think that this is currently only a theoretical
issue. At least I'm not aware of this causing problems with a real compiler.

Best regards,
Andreas
Henrik Gramner Oct. 24, 2016, 7:53 p.m.
On Mon, Oct 24, 2016 at 9:34 PM, wm4 <nfxjfg@googlemail.com> wrote:
> a ASM function must, according to the calling convention, reset the
> MMX state when returning.
>
> What FFmpeg does here was misdesigned from the very start.

The decision to issue emms manually instead of after every MMX
function was a deliberate decision. I'd hardly call it "misdesigned"
to make SIMD code twice as fast at the cost of technically abusing the
ABI, which has worked flawlessly for years until very recently.
Ronald S. Bultje Oct. 24, 2016, 7:59 p.m.
Hi,

On Mon, Oct 24, 2016 at 3:34 PM, wm4 <nfxjfg@googlemail.com> wrote:

> On Mon, 24 Oct 2016 21:19:46 +0200
> Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> wrote:
>
> > On 24.10.2016 16:14, Ronald S. Bultje wrote:
> > > On Mon, Oct 24, 2016 at 8:47 AM, wm4 <nfxjfg@googlemail.com> wrote:
> > >> On Mon, 24 Oct 2016 07:54:47 -0400
> > >> "Ronald S. Bultje" <rsbultje@gmail.com> wrote:
> > >>> On Mon, Oct 24, 2016 at 3:36 AM, wm4 <nfxjfg@googlemail.com> wrote:
> > >>>> I was under the impression that it is UB to have the FPU in MMX
> state
> > >>>> at any time while in C, not just while e.g. calling the stdlib.
> Maybe I
> > >>>> got that wrong (how would MMX intrinsics even work?) - can anyone
> shed
> > >>>> light on the exact requirements? (Possibly again, sorry.)
> > >>>
> > >>> I'm under the impression that it's part of the calling convention.
> That
> > >> is,
> > >>> any code anywhere (including mmx intrinsics, indeed) can - when
> called -
> > >>> expect the state to be cleared by the caller, just like you'd expect
> > >>> eax/edx to be caller-save (whereas esi/edi are callee-save).
> > >>>
> > >>> However, if you never call external code (including intrinsics), you
> can
> > >>> ignore this, just as you can ignore / create your own calling
> > >>> convention (remember fastcall etc.?). However, when calling any
> external
> > >>> code, this could (in theory) crash; it's just that right now it only
> > >>> crashes with musl when calling malloc/free. So basically, ffmpeg has
> its
> > >>> own calling convention, and manually calling emms_c() fixes "ffmpeg"
> > >>> calling convention to be compatible with "standard" calling
> > >> convention...?
> > >>
> > >> It can't really be a calling convention unless the compiler is aware
> of
> > >> it?
> >
> > It is defined as part of the System V Application Binary Interface [1]:
> > "The CPU shall be in x87 mode upon entry to a function. Therefore, every
> > function that uses the MMX registers is required to issue an emms or
> femms
> > instruction after using MMX registers, before returning or calling
> another
> > function."
>
> I mean FFmpeg can't make up its own calling convention without the
> compiler's knowledge.
>
> But thanks for reminding me about this but of the sysv ABI. The
> paragraph you quoted is actually very clear about the requirements. It
> means FFmpeg can barely do anything and remain standard compliant: a
> ASM function must, according to the calling convention, reset the MMX
> state when returning.
>
> What FFmpeg does here was misdesigned from the very start.


Good idea to reference Hendrik Gramner here, who keeps insisting we get rid
of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in
which MMX will be deprecated.

Food for thought, indeed.

Ronald
wm4 Oct. 24, 2016, 8 p.m.
On Mon, 24 Oct 2016 21:53:22 +0200
Henrik Gramner <henrik@gramner.com> wrote:

> On Mon, Oct 24, 2016 at 9:34 PM, wm4 <nfxjfg@googlemail.com> wrote:
> > a ASM function must, according to the calling convention, reset the
> > MMX state when returning.
> >
> > What FFmpeg does here was misdesigned from the very start.  
> 
> The decision to issue emms manually instead of after every MMX
> function was a deliberate decision. I'd hardly call it "misdesigned"
> to make SIMD code twice as fast at the cost of technically abusing the
> ABI, which has worked flawlessly for years until very recently.

It's still clearly not a good idea because it violates the ABI directly.

OK let's get back to the "practical" argument that we have not much of
a choice but to violate the ABI.

Then at least the "weaker" assumption, that emms should be called before
calling (or returning to) functions that don't explicitly support MMX,
should be followed.
Henrik Gramner Oct. 24, 2016, 8:26 p.m.
On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Good idea to reference Hendrik Gramner here, who keeps insisting we get rid
> of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in
> which MMX will be deprecated.

Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
but it's a fair amount of work and not done in an evening.

The fact that a lot of assembly lacks unit tests is certainly not
helping in that regard.

Some MMX instructions are slower than the equivalent SSE2 code on
Skylake. Intel hasn't officially commented on (as far as I know at
least) if we should expect this trend to continue, but they certainly
seem to treat MMX as legacy.

I doubt they would completely remove support for it though, backwards
compatibility is a big selling-point for x86.
Ronald S. Bultje Oct. 24, 2016, 8:31 p.m.
Hi,

On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik@gramner.com> wrote:

> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com>
> wrote:
> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
> rid
> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
> in
> > which MMX will be deprecated.
>
> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> but it's a fair amount of work and not done in an evening.
>
> The fact that a lot of assembly lacks unit tests is certainly not
> helping in that regard.
>
> Some MMX instructions are slower than the equivalent SSE2 code on
> Skylake. Intel hasn't officially commented on (as far as I know at
> least) if we should expect this trend to continue, but they certainly
> seem to treat MMX as legacy.
>
> I doubt they would completely remove support for it though, backwards
> compatibility is a big selling-point for x86.


Well, it gives us another way of fixing this issue (on x86-64 only): have
sse2 implementations for all code that has a mmx (register) path right now.

Given the complexity of this issue, it's worth considering that fix along
with the other possibilities...

Ronald
Hendrik Leppkes Oct. 24, 2016, 10 p.m.
On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> Hi,
>
> On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik@gramner.com> wrote:
>
>> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com>
>> wrote:
>> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
>> rid
>> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
>> in
>> > which MMX will be deprecated.
>>
>> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
>> but it's a fair amount of work and not done in an evening.
>>
>> The fact that a lot of assembly lacks unit tests is certainly not
>> helping in that regard.
>>
>> Some MMX instructions are slower than the equivalent SSE2 code on
>> Skylake. Intel hasn't officially commented on (as far as I know at
>> least) if we should expect this trend to continue, but they certainly
>> seem to treat MMX as legacy.
>>
>> I doubt they would completely remove support for it though, backwards
>> compatibility is a big selling-point for x86.
>
>
> Well, it gives us another way of fixing this issue (on x86-64 only): have
> sse2 implementations for all code that has a mmx (register) path right now.
>

I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
systems, either.
I wouldn't necessarily limit this to x86-64 at that, considering its
probably another half year at least, probably longer, until such a
conversion could be completed.

- Hendrik
u-9iep@aetey.se Oct. 25, 2016, 12:28 p.m.
On Mon, Oct 24, 2016 at 09:53:22PM +0200, Henrik Gramner wrote:
> The decision to issue emms manually instead of after every MMX
> function was a deliberate decision. I'd hardly call it "misdesigned"
> to make SIMD code twice as fast at the cost of technically abusing the

It would be nice to look at a benchmarking comparison, to be able to
see the actual practical performance gain of the decision not to follow
the ABI.

My expectation is among others that using MMX in a compliant way would
remain remarkably beneficial compared to no SIMD.

In a perfect world the user might be offered a build time choice:
either lose N% of the performance or take the consequences of not
following the ABI. Apparently, different users/packagers would make
different choices, but the crucial thing would be knowing the N.

Of course, offering to disable MMX _is_ such a choice, but having instead
an option of "somewhat slower, compliant MMX" would be much nicer.

my 2c
Rune
Henrik Gramner Oct. 25, 2016, 3:48 p.m.
On Tue, Oct 25, 2016 at 2:28 PM,  <u-9iep@aetey.se> wrote:
> It would be nice to look at a benchmarking comparison, to be able to
> see the actual practical performance gain of the decision not to follow
> the ABI.

Just a quick comparison from adding EMMS to a random MMX function
(from x264, because I happened to have the source for that but not
ffmpeg on this machine and I was too lazy to download it).

sad_4x4_c: 359
sad_4x4_mmx: 94
sad_4x4_mmx (emms): 186
u-9iep@aetey.se Oct. 25, 2016, 6:23 p.m.
On Tue, Oct 25, 2016 at 05:48:50PM +0200, Henrik Gramner wrote:
> On Tue, Oct 25, 2016 at 2:28 PM,  <u-9iep@aetey.se> wrote:
> > It would be nice to look at a benchmarking comparison, to be able to
> > see the actual practical performance gain of the decision not to follow
> > the ABI.
> 
> Just a quick comparison from adding EMMS to a random MMX function
> (from x264, because I happened to have the source for that but not
> ffmpeg on this machine and I was too lazy to download it).
> 
> sad_4x4_c: 359
> sad_4x4_mmx: 94
> sad_4x4_mmx (emms): 186

This is unfortunatly not exactly the numbers a user or a packager would
need to know. The relevant ones would be what time it takes to actually
compress or decompress a given file.

(if most of the time is spent calling this very function, then we can
extrapolate the numbers to the net result, otherwise we shouldn't)

If we nevertheless assume that the net performance is proportional
to the numbers above, we learn that omission of MMX acceleration
compatible with the standard ABI would lead to 50% performance loss
for the ABI-compliant users (pure C instead of compliant MMX).

This would be a pity, even if the circle of the users who depend
on strict ABI-compliance is limited, for the moment.

Regards,
Rune
Moritz Barsnick Oct. 25, 2016, 9:12 p.m.
On Tue, Oct 25, 2016 at 14:28:38 +0200, u-9iep@aetey.se wrote:
> In a perfect world the user might be offered a build time choice:
> either lose N% of the performance or take the consequences of not
> following the ABI.

That was exactly my thought too, assuming it doesn't pollute the code
even more:
--disable-fast-simd or --enable-compliant-mmx or something. "auto"
would detect musl and/or test for the necessity to use emms_c() or
other measures.

Moritz
Ronald S. Bultje Oct. 25, 2016, 9:40 p.m.
Hi,

On Tue, Oct 25, 2016 at 5:12 PM, Moritz Barsnick <barsnick@gmx.net> wrote:

> On Tue, Oct 25, 2016 at 14:28:38 +0200, u-9iep@aetey.se wrote:
> > In a perfect world the user might be offered a build time choice:
> > either lose N% of the performance or take the consequences of not
> > following the ABI.
>
> That was exactly my thought too, assuming it doesn't pollute the code
> even more:
> --disable-fast-simd or --enable-compliant-mmx or something. "auto"
> would detect musl and/or test for the necessity to use emms_c() or
> other measures.


If you don't care about performance, issue a emms upon RET in INIT_MMX
functions (just like we do for vzeroupper in INIT_YMM functions) using
x86inc.asm. Of course, this should be disabled for x86-64 or non-musl
builds.

(That doesn't yet deal with inline asm.)

Ronald
Michael Niedermayer Oct. 26, 2016, 1:54 p.m.
On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> > Hi,
> >
> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik@gramner.com> wrote:
> >
> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com>
> >> wrote:
> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
> >> rid
> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
> >> in
> >> > which MMX will be deprecated.
> >>
> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> >> but it's a fair amount of work and not done in an evening.
> >>
> >> The fact that a lot of assembly lacks unit tests is certainly not
> >> helping in that regard.
> >>
> >> Some MMX instructions are slower than the equivalent SSE2 code on
> >> Skylake. Intel hasn't officially commented on (as far as I know at
> >> least) if we should expect this trend to continue, but they certainly
> >> seem to treat MMX as legacy.
> >>
> >> I doubt they would completely remove support for it though, backwards
> >> compatibility is a big selling-point for x86.
> >
> >
> > Well, it gives us another way of fixing this issue (on x86-64 only): have
> > sse2 implementations for all code that has a mmx (register) path right now.
> >
> 
> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
> systems, either.

SSE2 was initially not faster than MMX as CPUs implemented it as 2
MMX operations internally not having a full width SIMD unit for SSE*
so there would be a performace loss on some x86-32 CPUs if MMX was
replaced by "half-width SSE2" there

[...]
Hendrik Leppkes Oct. 26, 2016, 2:21 p.m.
On Wed, Oct 26, 2016 at 3:54 PM, Michael Niedermayer
<michael@niedermayer.cc> wrote:
> On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
>> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
>> > Hi,
>> >
>> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik@gramner.com> wrote:
>> >
>> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com>
>> >> wrote:
>> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
>> >> rid
>> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
>> >> in
>> >> > which MMX will be deprecated.
>> >>
>> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
>> >> but it's a fair amount of work and not done in an evening.
>> >>
>> >> The fact that a lot of assembly lacks unit tests is certainly not
>> >> helping in that regard.
>> >>
>> >> Some MMX instructions are slower than the equivalent SSE2 code on
>> >> Skylake. Intel hasn't officially commented on (as far as I know at
>> >> least) if we should expect this trend to continue, but they certainly
>> >> seem to treat MMX as legacy.
>> >>
>> >> I doubt they would completely remove support for it though, backwards
>> >> compatibility is a big selling-point for x86.
>> >
>> >
>> > Well, it gives us another way of fixing this issue (on x86-64 only): have
>> > sse2 implementations for all code that has a mmx (register) path right now.
>> >
>>
>> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
>> systems, either.
>
> SSE2 was initially not faster than MMX as CPUs implemented it as 2
> MMX operations internally not having a full width SIMD unit for SSE*
> so there would be a performace loss on some x86-32 CPUs if MMX was
> replaced by "half-width SSE2" there
>

You can add "not caring about first-gen sse2 CPUs" to the list as
well, if you want. Those are way old as well.
There is going to be a performance loss either way, except that emms
slows it down everywhere, while using sse2 is likely to be fine on
modern CPUs. So IMHO thats the better course to take, if any.

- Hendrik
u-9iep@aetey.se Oct. 26, 2016, 2:52 p.m.
On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
> You can add "not caring about first-gen sse2 CPUs" to the list as
> well, if you want. Those are way old as well.
> There is going to be a performance loss either way, except that emms
> slows it down everywhere, while using sse2 is likely to be fine on
> modern CPUs. So IMHO thats the better course to take, if any.

Note that making MMX-acceleration unavailable would hurt at most on
those slowest, non-sse systems, which otherwise e.g. often struggle for
real-time decoding of heavier streams.

One of the highly appreciated virtues of ffmpeg is its efficiency,
this makes hardware much more useful. Please do not cut off the low
end systems when possible.

Regards,
Rune
Hendrik Leppkes Oct. 26, 2016, 3:08 p.m.
On Wed, Oct 26, 2016 at 4:52 PM,  <u-9iep@aetey.se> wrote:
> On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
>> You can add "not caring about first-gen sse2 CPUs" to the list as
>> well, if you want. Those are way old as well.
>> There is going to be a performance loss either way, except that emms
>> slows it down everywhere, while using sse2 is likely to be fine on
>> modern CPUs. So IMHO thats the better course to take, if any.
>
> Note that making MMX-acceleration unavailable would hurt at most on
> those slowest, non-sse systems, which otherwise e.g. often struggle for
> real-time decoding of heavier streams.
>
> One of the highly appreciated virtues of ffmpeg is its efficiency,
> this makes hardware much more useful. Please do not cut off the low
> end systems when possible.
>

Its not about low-end, its about 15+ years old hardware. At some point
you just have to replace this stuff.
For $100 you can buy a system at least 10 times as fast then those.
Heck even a RPi might rival those for $20 or so.

If you want to convince developers to fix the MMX situation properly,
which would cost a lot of time to do, then the best course is to look
forward, which the SSE2 option imho is doing.
All other options are just band-aids.

- Hendrik
Michael Niedermayer Oct. 26, 2016, 3:42 p.m.
On Wed, Oct 26, 2016 at 04:21:14PM +0200, Hendrik Leppkes wrote:
> On Wed, Oct 26, 2016 at 3:54 PM, Michael Niedermayer
> <michael@niedermayer.cc> wrote:
> > On Tue, Oct 25, 2016 at 12:00:01AM +0200, Hendrik Leppkes wrote:
> >> On Mon, Oct 24, 2016 at 10:31 PM, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > On Mon, Oct 24, 2016 at 4:26 PM, Henrik Gramner <henrik@gramner.com> wrote:
> >> >
> >> >> On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje <rsbultje@gmail.com>
> >> >> wrote:
> >> >> > Good idea to reference Hendrik Gramner here, who keeps insisting we get
> >> >> rid
> >> >> > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs
> >> >> in
> >> >> > which MMX will be deprecated.
> >> >>
> >> >> Replacing MMX with SSE2 is indeed the most "proper" fix in my opinion,
> >> >> but it's a fair amount of work and not done in an evening.
> >> >>
> >> >> The fact that a lot of assembly lacks unit tests is certainly not
> >> >> helping in that regard.
> >> >>
> >> >> Some MMX instructions are slower than the equivalent SSE2 code on
> >> >> Skylake. Intel hasn't officially commented on (as far as I know at
> >> >> least) if we should expect this trend to continue, but they certainly
> >> >> seem to treat MMX as legacy.
> >> >>
> >> >> I doubt they would completely remove support for it though, backwards
> >> >> compatibility is a big selling-point for x86.
> >> >
> >> >
> >> > Well, it gives us another way of fixing this issue (on x86-64 only): have
> >> > sse2 implementations for all code that has a mmx (register) path right now.
> >> >
> >>
> >> I don't think the argument for pre-sse2 CPUs is that strong on 32-bit
> >> systems, either.
> >
> > SSE2 was initially not faster than MMX as CPUs implemented it as 2
> > MMX operations internally not having a full width SIMD unit for SSE*
> > so there would be a performace loss on some x86-32 CPUs if MMX was
> > replaced by "half-width SSE2" there
> >
> 
> You can add "not caring about first-gen sse2 CPUs" to the list as

its more like 3 or 4 generations than 1 according to the instruction
tables from Agner Fog

core 2 (Merom) seems the first that has partial full width support
shift/pack/unpack/shuffle still are faster as MMX
PM, P4, P4E all seem half speed at SSE* than MMX


> well, if you want. Those are way old as well.


> There is going to be a performance loss either way, except that emms
> slows it down everywhere, while using sse2 is likely to be fine on

minor detail being that there is a factor of around
ten thousand in the speed loss between the 2 cases you compare
(0.001% vs maybe 50%)

Droping MMX will cause pre SSE2 CPUs to be alot slower, maybe half
speed overall or less, they loose all SIMD optimizations. On older
SSE2 cpus its still going to be a hefty hit too.
adding emms at a video frame or slice level which is what the patches
posted do pretty much has no real effect but dont belive me look at the
timings worst case i see in agners tables are 18 clock cycles that at
60fps and 1slice and a slow 100mhz cpu is 0.001%
even if there are 100 times more emms (due to slice level EMMS) it
still at the edge of being
hard to meassure. Doing EMMS per function call is of course not
prcatical.

theres an additional penalty for the first float instruction after emms
on some cpus, 58 clock cycles (on P4) but thats still just 0.003% in the
example above.

anyway, i wantd to stay out of this and ill do that, just wanted to
comment on the technical details

[...]
u-9iep@aetey.se Oct. 26, 2016, 4:04 p.m.
On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > One of the highly appreciated virtues of ffmpeg is its efficiency,
> > this makes hardware much more useful. Please do not cut off the low
> > end systems when possible.

> Its not about low-end, its about 15+ years old hardware. At some point

With all respect, I find such an argument hardly appropriate for two reasons:
it is formally incorrect (see below),
it shifts the attention from the functionality/usability to emotionally
charged aspects like age.

I am participating in the discussion as a concerned _packager_.
Nevertheless I am a ffmpeg "user" too.

As a matter of fact I am running several low power units which have mmx but no sse.
Several of them exist for handling video, either for playback or encoding or both.

They have been set up this way because they fit the bill very nicely,
among others thanks to ffmpeg.

The one I looked at recently is manufactured 10 years ago and is not
youngest in its generation. So we have to wait at least extra 5 years
to make it 15+.

> you just have to replace this stuff.
> For $100 you can buy a system at least 10 times as fast then those.
> Heck even a RPi might rival those for $20 or so.

Such arguments can not be passed on to the users without an offer of
a corresponding amount in cache. :)

More seriously, the act of replacement itself has its costs too, in
cases like this one such cost is much higher than the cost of hardware.

(regrettably, this cost or gain is not trivial to convert to a
compensation for the ffmpeg developers, unless video is one's main
business, sorry for that!)

Regards,
Rune
Ronald S. Bultje Oct. 26, 2016, 4:10 p.m.
Hi,

On Wed, Oct 26, 2016 at 12:04 PM, <u-9iep@aetey.se> wrote:

> On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > you just have to replace this stuff.
> > For $100 you can buy a system at least 10 times as fast then those.
> > Heck even a RPi might rival those for $20 or so.
>
> Such arguments can not be passed on to the users without an offer of
> a corresponding amount in cache. :)


Will you pay us for code maintenance?

Ronald
wm4 Oct. 26, 2016, 4:17 p.m.
On Wed, 26 Oct 2016 18:04:24 +0200
u-9iep@aetey.se wrote:

> On Wed, Oct 26, 2016 at 05:08:56PM +0200, Hendrik Leppkes wrote:
> > > One of the highly appreciated virtues of ffmpeg is its efficiency,
> > > this makes hardware much more useful. Please do not cut off the low
> > > end systems when possible.  
> 
> > Its not about low-end, its about 15+ years old hardware. At some point  
> 
> With all respect, I find such an argument hardly appropriate for two reasons:
> it is formally incorrect (see below),
> it shifts the attention from the functionality/usability to emotionally
> charged aspects like age.

It's not an "emotionally charged" aspect. Phasing out old hardware is
common sense. Of course this doesn't appeal to the "keep all the trash"
crowd, which for some reason seems to be going strong in FFmpeg.
u-9iep@aetey.se Oct. 26, 2016, 6:24 p.m.
On Wed, Oct 26, 2016 at 06:17:27PM +0200, wm4 wrote:
> > With all respect, I find such an argument hardly appropriate for two reasons:
> > it is formally incorrect (see below),
> > it shifts the attention from the functionality/usability to emotionally
> > charged aspects like age.
> 
> It's not an "emotionally charged" aspect. Phasing out old hardware is
> common sense. Of course this doesn't appeal to the "keep all the trash"

You do not seem to notice that you make the same mistake,
using emotionally charged expressions.

You are insulting the people who use older hardware, which is
_not_necessarily_ "trash" and who do this for their reasons - good or
bad is not in the realm of one's competence unless one is involved in
the actual case.

You the developers know everything between CPU, memory and I/O. The user
acts in her real world. For the developers it is unfounded to pretend
to be familiar with all the worlds out there.

If not otherwise, I mentioned the good reasons to run on the existing
hardware:

On Wed, Oct 26, 2016 at 06:04:24PM +0200, u-9iep@aetey.se wrote:
> Such arguments can not be passed on to the users without an offer of
> a corresponding amount in cache. :)
> 
> More seriously, the act of replacement itself has its costs too, in
> cases like this one such cost is much higher than the cost of hardware.

To make it clear why I do not offer the expected spared money to the
developers, let me point out the phrase which may have gone unnoticed:

> (regrettably, this cost or gain is not trivial to convert to a
> compensation for the ffmpeg developers, unless video is one's main
> business, sorry for that!)
=>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I did my best to contribute some small bits of code and of understanding.

To conclude, I would like to friendly suggest to every bright
programmer on this list (sincerely highly respecting your skills),
try to notice when you tread outside your competence area.

Good luck to the ffmpeg project, my humble effort here is over.

Regards,
Rune

Patch hide | download patch | download mbox

diff --git a/libavcodec/diracdec.c b/libavcodec/diracdec.c
index b183fad..e295b14 100644
--- a/libavcodec/diracdec.c
+++ b/libavcodec/diracdec.c
@@ -2184,6 +2184,7 @@  static int dirac_decode_data_unit(AVCodecContext *avctx, const uint8_t *buf, int
 
         /* [DIRAC_STD] 13.0 Transform data syntax. transform_data() */
         ret = dirac_decode_frame_internal(s);
+        emms_c();
         if (ret < 0)
             return ret;
     }