diff mbox

[FFmpeg-devel,8/8] avcodec/v210enc: add AVX-512 10-bit line pack function

Message ID 20171030130835.28327-9-jdarnley@obe.tv
State Superseded
Headers show

Commit Message

James Darnley Oct. 30, 2017, 1:08 p.m. UTC
---
 libavcodec/x86/v210enc.asm    | 5 +++++
 libavcodec/x86/v210enc_init.c | 7 +++++++
 2 files changed, 12 insertions(+)

Comments

Henrik Gramner Oct. 30, 2017, 4:14 p.m. UTC | #1
On Mon, Oct 30, 2017 at 2:08 PM, James Darnley <jdarnley@obe.tv> wrote:
> +INIT_YMM avx512

ymm?
James Darnley Oct. 30, 2017, 5:46 p.m. UTC | #2
On 2017-10-30 17:14, Henrik Gramner wrote:
> On Mon, Oct 30, 2017 at 2:08 PM, James Darnley <jdarnley@obe.tv> wrote:
>> +INIT_YMM avx512
> 
> ymm?

Yes.  I haven't written a correct one using zmm regs yet.  I will ask
some questions about it, possibly very soon.  If I don't get a longer
version to work, I could just use zmm but not use all the space.  (This
is supposed to be a test so I do not need to be concerned about speed.)

This does use some new instructions, at least.
diff mbox

Patch

diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index 965f2bea3c..5068af27f8 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -103,6 +103,11 @@  INIT_YMM avx2
 v210_planar_pack_10
 %endif
 
+%if HAVE_AVX512_EXTERNAL
+INIT_YMM avx512
+v210_planar_pack_10
+%endif
+
 %macro v210_planar_pack_8 0
 
 ; v210_planar_pack_8(const uint8_t *y, const uint8_t *u, const uint8_t *v, uint8_t *dst, ptrdiff_t width)
diff --git a/libavcodec/x86/v210enc_init.c b/libavcodec/x86/v210enc_init.c
index e997b4b67a..e8aac373a0 100644
--- a/libavcodec/x86/v210enc_init.c
+++ b/libavcodec/x86/v210enc_init.c
@@ -32,6 +32,9 @@  void ff_v210_planar_pack_10_ssse3(const uint16_t *y, const uint16_t *u,
 void ff_v210_planar_pack_10_avx2(const uint16_t *y, const uint16_t *u,
                                  const uint16_t *v, uint8_t *dst,
                                  ptrdiff_t width);
+void ff_v210_planar_pack_10_avx512(const uint16_t *y, const uint16_t *u,
+                                   const uint16_t *v, uint8_t *dst,
+                                   ptrdiff_t width);
 
 av_cold void ff_v210enc_init_x86(V210EncContext *s)
 {
@@ -51,4 +54,8 @@  av_cold void ff_v210enc_init_x86(V210EncContext *s)
         s->sample_factor_10 = 2;
         s->pack_line_10     = ff_v210_planar_pack_10_avx2;
     }
+
+    if (EXTERNAL_AVX512(cpu_flags)) {
+        s->pack_line_10 = ff_v210_planar_pack_10_avx512;
+    }
 }