Message ID | 1491324498-26655-5-git-send-email-rsbultje@gmail.com |
---|---|
State | Superseded |
Headers | show |
On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote: > These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. > This way we don't need to use the ff_put/add_pixels_clamped function > pointers. > --- > libavcodec/x86/idctdsp_init.c | 10 ++++++++++ > libavcodec/x86/simple_idct.c | 15 +++++++++++++-- > libavcodec/x86/simple_idct.h | 3 +++ > 3 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c > index bcf7e5b..579c5e7 100644 > --- a/libavcodec/x86/idctdsp_init.c > +++ b/libavcodec/x86/idctdsp_init.c > @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, > } > > if (ARCH_X86_64 && avctx->lowres == 0) { > + if (!high_bit_depth && > + avctx->lowres == 0 && this looks redundant otherwise should be ok thx [...]
Hi, On Tue, Apr 4, 2017 at 5:52 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote: > > These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. > > This way we don't need to use the ff_put/add_pixels_clamped function > > pointers. > > --- > > libavcodec/x86/idctdsp_init.c | 10 ++++++++++ > > libavcodec/x86/simple_idct.c | 15 +++++++++++++-- > > libavcodec/x86/simple_idct.h | 3 +++ > > 3 files changed, 26 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/x86/idctdsp_init.c > b/libavcodec/x86/idctdsp_init.c > > index bcf7e5b..579c5e7 100644 > > --- a/libavcodec/x86/idctdsp_init.c > > +++ b/libavcodec/x86/idctdsp_init.c > > @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, > AVCodecContext *avctx, > > } > > > > if (ARCH_X86_64 && avctx->lowres == 0) { > > + if (!high_bit_depth && > > > + avctx->lowres == 0 && > > this looks redundant > > otherwise should be ok > > thx Thanks. While looking, this patch may actually have a slightly theoretical issue. Imagine that you're on a configuration where external asm is disabled but inline asm is enabled. In our current system, put/add_pixels_clamped_mmx/sse2 are yasm functions, but idct() is still inline. So in this configuration, idct() is available, but idct_add/put are not. What do we set idct_permutation to? (I know that in practice this configuration is basically a bug and so the performance implications aren't that important, but I do believe it should not crash; should we just move the assignment of idct under HAVE_EXTERNAL_MMX - which is where idct_put/add would go - even though it strictly speaking doesn't use external asm?) Ronald
On Tue, Apr 04, 2017 at 06:13:30PM -0400, Ronald S. Bultje wrote: > Hi, > > On Tue, Apr 4, 2017 at 5:52 PM, Michael Niedermayer <michael@niedermayer.cc> > wrote: > > > On Tue, Apr 04, 2017 at 12:48:17PM -0400, Ronald S. Bultje wrote: > > > These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. > > > This way we don't need to use the ff_put/add_pixels_clamped function > > > pointers. > > > --- > > > libavcodec/x86/idctdsp_init.c | 10 ++++++++++ > > > libavcodec/x86/simple_idct.c | 15 +++++++++++++-- > > > libavcodec/x86/simple_idct.h | 3 +++ > > > 3 files changed, 26 insertions(+), 2 deletions(-) > > > > > > diff --git a/libavcodec/x86/idctdsp_init.c > > b/libavcodec/x86/idctdsp_init.c > > > index bcf7e5b..579c5e7 100644 > > > --- a/libavcodec/x86/idctdsp_init.c > > > +++ b/libavcodec/x86/idctdsp_init.c > > > @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, > > AVCodecContext *avctx, > > > } > > > > > > if (ARCH_X86_64 && avctx->lowres == 0) { > > > + if (!high_bit_depth && > > > > > + avctx->lowres == 0 && > > > > this looks redundant > > > > otherwise should be ok > > > > thx > > > Thanks. > > While looking, this patch may actually have a slightly theoretical issue. > Imagine that you're on a configuration where external asm is disabled but > inline asm is enabled. In our current system, > put/add_pixels_clamped_mmx/sse2 are yasm functions, but idct() is still > inline. So in this configuration, idct() is available, but idct_add/put are > not. > > What do we set idct_permutation to? idct_permutation must match the idcts, the idcts must match each other permutation wise for internal / extern, we could sidestep this by using a internal clamped with internal idcts > (I know that in practice this configuration is basically a bug and so the > performance implications aren't that important, but I do believe it should > not crash; should we just move the assignment of idct under > HAVE_EXTERNAL_MMX - which is where idct_put/add would go - even though it > strictly speaking doesn't use external asm?) > if the idct depends on external asm then it would need to be under HAVE_EXTERNAL_*, you are correct. Also that is ugly [...]
Hi, On Tue, Apr 4, 2017 at 9:54 PM, Michael Niedermayer <michael@niedermayer.cc> wrote: > if the idct depends on external asm then it would need to be under > HAVE_EXTERNAL_*, you are correct. Also that is ugly It's temporary, I think James/Kieran were talking about converting the IDCT to yasm and then having an SSE2 version (for 8bit), which would be quite nice. Right now we have a SSE2 for 10/12 bit on 64bit only and the inline asm on 32+64bit for 8bit. Ronald
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index bcf7e5b..579c5e7 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/idctdsp_init.c @@ -87,6 +87,16 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx, } if (ARCH_X86_64 && avctx->lowres == 0) { + if (!high_bit_depth && + avctx->lowres == 0 && + (avctx->idct_algo == FF_IDCT_AUTO || + avctx->idct_algo == FF_IDCT_SIMPLEAUTO || + avctx->idct_algo == FF_IDCT_SIMPLEMMX)) { + c->idct_put = ff_simple_idct_put_sse2; + c->idct_add = ff_simple_idct_add_sse2; + c->perm_type = FF_IDCT_PERM_SIMPLE; + } + if (avctx->bits_per_raw_sample == 10 && (avctx->idct_algo == FF_IDCT_AUTO || avctx->idct_algo == FF_IDCT_SIMPLEAUTO || diff --git a/libavcodec/x86/simple_idct.c b/libavcodec/x86/simple_idct.c index d3a19fa..1155920 100644 --- a/libavcodec/x86/simple_idct.c +++ b/libavcodec/x86/simple_idct.c @@ -24,6 +24,7 @@ #include "libavutil/x86/asm.h" #include "libavcodec/idctdsp.h" +#include "libavcodec/x86/idctdsp.h" #include "idctdsp.h" #include "simple_idct.h" @@ -907,12 +908,22 @@ void ff_simple_idct_mmx(int16_t *block) void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_put_pixels_clamped(block, dest, line_size); + ff_put_pixels_clamped_mmx(block, dest, line_size); } void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block) { idct(block); - ff_add_pixels_clamped(block, dest, line_size); + ff_add_pixels_clamped_mmx(block, dest, line_size); +} +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_put_pixels_clamped_sse2(block, dest, line_size); +} +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block) +{ + idct(block); + ff_add_pixels_clamped_sse2(block, dest, line_size); } #endif /* HAVE_INLINE_ASM */ diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h index ad76baf..d17ef6a 100644 --- a/libavcodec/x86/simple_idct.h +++ b/libavcodec/x86/simple_idct.h @@ -26,6 +26,9 @@ void ff_simple_idct_mmx(int16_t *block); void ff_simple_idct_add_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); +void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t *block); + void ff_simple_idct10_sse2(int16_t *block); void ff_simple_idct10_avx(int16_t *block);