diff mbox

[FFmpeg-devel,5/6] x86/simple_idct: add explicit sse2 simple_idct_put/add versions.

Message ID 1491324498-26655-5-git-send-email-rsbultje@gmail.com
State Superseded
Headers show

Commit Message

Ronald S. Bultje April 4, 2017, 4:48 p.m. UTC
These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
This way we don't need to use the ff_put/add_pixels_clamped function
pointers.
---
 libavcodec/x86/idctdsp_init.c | 10 ++++++++++
 libavcodec/x86/simple_idct.c  | 15 +++++++++++++--
 libavcodec/x86/simple_idct.h  |  3 +++
 3 files changed, 26 insertions(+), 2 deletions(-)

Comments

Michael Niedermayer April 4, 2017, 9:52 p.m. UTC | #1
On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote:
> These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
> This way we don't need to use the ff_put/add_pixels_clamped function
> pointers.
> ---
>  libavcodec/x86/idctdsp_init.c | 10 ++++++++++
>  libavcodec/x86/simple_idct.c  | 15 +++++++++++++--
>  libavcodec/x86/simple_idct.h  |  3 +++
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
> index bcf7e5b..579c5e7 100644
> --- a/libavcodec/x86/idctdsp_init.c
> +++ b/libavcodec/x86/idctdsp_init.c
> @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
>      }
>  
>      if (ARCH_X86_64 && avctx->lowres == 0) {
> +        if (!high_bit_depth &&

> +            avctx->lowres == 0 &&

this looks redundant

otherwise should be ok

thx

[...]
Ronald S. Bultje April 4, 2017, 10:13 p.m. UTC | #2
Hi,

On Tue, Apr 4, 2017 at 5:52 PM, Michael Niedermayer <michael@niedermayer.cc>
wrote:

> On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote:
> > These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
> > This way we don't need to use the ff_put/add_pixels_clamped function
> > pointers.
> > ---
> >  libavcodec/x86/idctdsp_init.c | 10 ++++++++++
> >  libavcodec/x86/simple_idct.c  | 15 +++++++++++++--
> >  libavcodec/x86/simple_idct.h  |  3 +++
> >  3 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/x86/idctdsp_init.c
> b/libavcodec/x86/idctdsp_init.c
> > index bcf7e5b..579c5e7 100644
> > --- a/libavcodec/x86/idctdsp_init.c
> > +++ b/libavcodec/x86/idctdsp_init.c
> > @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c,
> AVCodecContext *avctx,
> >      }
> >
> >      if (ARCH_X86_64 && avctx->lowres == 0) {
> > +        if (!high_bit_depth &&
>
> > +            avctx->lowres == 0 &&
>
> this looks redundant
>
> otherwise should be ok
>
> thx


Thanks.

While looking, this patch may actually have a slightly theoretical issue.
Imagine that you're on a configuration where external asm is disabled but
inline asm is enabled. In our current system,
put/add_pixels_clamped_mmx/sse2 are yasm functions, but idct() is still
inline. So in this configuration, idct() is available, but idct_add/put are
not.

What do we set idct_permutation to?

(I know that in practice this configuration is basically a bug and so the
performance implications aren't that important, but I do believe it should
not crash; should we just move the assignment of idct under
HAVE_EXTERNAL_MMX - which is where idct_put/add would go - even though it
strictly speaking doesn't use external asm?)

Ronald
Michael Niedermayer April 5, 2017, 1:54 a.m. UTC | #3
On Tue, Apr 04, 2017 at 06:13:30PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Tue, Apr 4, 2017 at 5:52 PM, Michael Niedermayer <michael@niedermayer.cc>
> wrote:
> 
> > On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote:
> > > These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations.
> > > This way we don't need to use the ff_put/add_pixels_clamped function
> > > pointers.
> > > ---
> > >  libavcodec/x86/idctdsp_init.c | 10 ++++++++++
> > >  libavcodec/x86/simple_idct.c  | 15 +++++++++++++--
> > >  libavcodec/x86/simple_idct.h  |  3 +++
> > >  3 files changed, 26 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/libavcodec/x86/idctdsp_init.c
> > b/libavcodec/x86/idctdsp_init.c
> > > index bcf7e5b..579c5e7 100644
> > > --- a/libavcodec/x86/idctdsp_init.c
> > > +++ b/libavcodec/x86/idctdsp_init.c
> > > @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c,
> > AVCodecContext *avctx,
> > >      }
> > >
> > >      if (ARCH_X86_64 && avctx->lowres == 0) {
> > > +        if (!high_bit_depth &&
> >
> > > +            avctx->lowres == 0 &&
> >
> > this looks redundant
> >
> > otherwise should be ok
> >
> > thx
> 
> 
> Thanks.
> 
> While looking, this patch may actually have a slightly theoretical issue.
> Imagine that you're on a configuration where external asm is disabled but
> inline asm is enabled. In our current system,
> put/add_pixels_clamped_mmx/sse2 are yasm functions, but idct() is still
> inline. So in this configuration, idct() is available, but idct_add/put are
> not.
> 

> What do we set idct_permutation to?

idct_permutation must match the idcts, the idcts must match each
other permutation wise

for internal / extern, we could sidestep this by using a internal
clamped with internal idcts


> (I know that in practice this configuration is basically a bug and so the
> performance implications aren't that important, but I do believe it should
> not crash; should we just move the assignment of idct under
> HAVE_EXTERNAL_MMX - which is where idct_put/add would go - even though it
> strictly speaking doesn't use external asm?)
>

if the idct depends on external asm then it would need to be under
HAVE_EXTERNAL_*, you are correct. Also that is ugly


[...]
Ronald S. Bultje April 5, 2017, 10:25 a.m. UTC | #4
Hi,

On Tue, Apr 4, 2017 at 9:54 PM, Michael Niedermayer <michael@niedermayer.cc>
wrote:

> if the idct depends on external asm then it would need to be under
> HAVE_EXTERNAL_*, you are correct. Also that is ugly


It's temporary, I think James/Kieran were talking about converting the IDCT
to yasm and then having an SSE2 version (for 8bit), which would be quite
nice. Right now we have a SSE2 for 10/12 bit on 64bit only and the inline
asm on 32+64bit for 8bit.

Ronald
diff mbox

Patch

diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index bcf7e5b..579c5e7 100644
--- a/libavcodec/x86/idctdsp_init.c
+++ b/libavcodec/x86/idctdsp_init.c
@@ -87,6 +87,16 @@  av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
     }
 
     if (ARCH_X86_64 && avctx->lowres == 0) {
+        if (!high_bit_depth &&
+            avctx->lowres == 0 &&
+            (avctx->idct_algo == FF_IDCT_AUTO ||
+             avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
+             avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
+                c->idct_put  = ff_simple_idct_put_sse2;
+                c->idct_add  = ff_simple_idct_add_sse2;
+                c->perm_type = FF_IDCT_PERM_SIMPLE;
+        }
+
         if (avctx->bits_per_raw_sample == 10 &&
         (avctx->idct_algo == FF_IDCT_AUTO ||
          avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
diff --git a/libavcodec/x86/simple_idct.c b/libavcodec/x86/simple_idct.c
index d3a19fa..1155920 100644
--- a/libavcodec/x86/simple_idct.c
+++ b/libavcodec/x86/simple_idct.c
@@ -24,6 +24,7 @@ 
 #include "libavutil/x86/asm.h"
 
 #include "libavcodec/idctdsp.h"
+#include "libavcodec/x86/idctdsp.h"
 
 #include "idctdsp.h"
 #include "simple_idct.h"
@@ -907,12 +908,22 @@  void ff_simple_idct_mmx(int16_t *block)
 void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block)
 {
     idct(block);
-    ff_put_pixels_clamped(block, dest, line_size);
+    ff_put_pixels_clamped_mmx(block, dest, line_size);
 }
 void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block)
 {
     idct(block);
-    ff_add_pixels_clamped(block, dest, line_size);
+    ff_add_pixels_clamped_mmx(block, dest, line_size);
+}
+void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block)
+{
+    idct(block);
+    ff_put_pixels_clamped_sse2(block, dest, line_size);
+}
+void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block)
+{
+    idct(block);
+    ff_add_pixels_clamped_sse2(block, dest, line_size);
 }
 
 #endif /* HAVE_INLINE_ASM */
diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h
index ad76baf..d17ef6a 100644
--- a/libavcodec/x86/simple_idct.h
+++ b/libavcodec/x86/simple_idct.h
@@ -26,6 +26,9 @@  void ff_simple_idct_mmx(int16_t *block);
 void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block);
 void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block);
 
+void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block);
+void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block);
+
 void ff_simple_idct10_sse2(int16_t *block);
 void ff_simple_idct10_avx(int16_t *block);