From patchwork Wed Feb 23 08:57:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34465 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp632821nkx; Wed, 23 Feb 2022 00:57:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJwhNt9ViZRYOUVDcZ5fXrvwNQWqjsCL8ciiWxlrsPLfFTYAqgTatWwTyyfm8d9TSCKbysCn X-Received: by 2002:a05:6402:518b:b0:412:d173:1e29 with SMTP id q11-20020a056402518b00b00412d1731e29mr22763340edd.302.1645606678019; Wed, 23 Feb 2022 00:57:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606678; cv=none; d=google.com; s=arc-20160816; b=MwsdPMqY+yXancixjDLbZ81hSvmQJltq0d+2549KOiG/CvV+oUdmLj27IIaKabw3+b /bOABoPW6pCY5aSKY59+WjLlcFdyJb6HYNiie2LkyLhbpq6FjMXa2rNE2IP1fWWuRNcZ ftufZoyeUFXtAuwukAdWMh4+2fj7oVaCcWYfCez5qHGeM8dPIElIut+qjcJO7aZfdSw1 7TBDXRcWzHOZ1wCDPIdcWn66aCSHcOzI/AqEJcRk3N8RIsJp8hLYLzt5h8Owdn91dZAx 8Tyzn8IuC5k32FgAHcHsG1903tcfXyPeYfqNT7fffNDFiarodIAq7Q5av4+YS+Uk9QZ0 RgRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:message-id:date:to:from:dkim-signature :delivered-to; bh=Qw9Kc7rBKaQoae4V5CW/w5e5vnHXha9g3BoZcjzq/8w=; b=fLRagdSlAu1zPmD8neymuLQb3t9tGYWIoFVLUKyQWiHahcVMDfEmpvEIRAzO1YsND1 /MqNtbKPDyq35pNDqulFez3SFkt2yQdyJOm3KLiKPrbNSZ4fObMS/Tb1Y69fDDMmSvOc vvdpMpkuKqjMK17vBUWM+fGDyyxD1twn6EcH2JC4HpH0ce179774ridmi1gsV9NjCoCF Bl3CZEChTuA3vqM9DdJBA02VKlXqO+zRz3UE9a2a68zAJtRd6PXK9UHxXVmUWLOG8xVs 40BmcJ6dpZ0FXOYKCxL/jrcKMDWLJg0NmNCEOH21ucVzd+ukaOvJo+ueHuAlLbCt1+x/ au7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=RDWlSP5t; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a97si15267323edf.300.2022.02.23.00.57.57; Wed, 23 Feb 2022 00:57:58 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=RDWlSP5t; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 950C568B20D; Wed, 23 Feb 2022 10:57:54 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5347068B20D for ; Wed, 23 Feb 2022 10:57:47 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606672; x=1677142672; h=from:to:cc:subject:date:message-id; bh=cFh3Ga1ghKEmDd9c2SVtHCxUpxndmoUqPaPpZHCCD0M=; b=RDWlSP5tu87B7T+DHH2HVdeWxJSCIAEMc5EH/Yirh44L7vsme14NP9lX 5rfvfN4SE1X8O2DcPZPcGhY3DpPitBrKp0yvF3cUXqLBNH0HCt7/sqAxz 3BACDWwgL1t6UgGq5/FWFhAeLSv/CxmCv6fIrFf2zFklR8kGv2meQAmow s8sNMmDuiupYasnJ6BBi1veqhd2qMFLO9D5+9i7lC83kPL642idq4modA DM4o659ARWWCe7DKZxesHQeZGueQ5pDhe5e8PRDDtXOYgtXCEVuFoTxLA hAOqAtLRm33bgySzeH7uw5W3I1/PFndS9IvGwn1Lqt5kytE801oGwsdgz g==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427225" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427225" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005788" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:43 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:30 +0800 Message-Id: <20220223085735.70854-1-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [FFmpeg-devel] [PATCH 1/6] avutil/cpu: add AVX512 Icelake flag X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: RAd0F1o5DyC5 From: Wu Jianhua Signed-off-by: Wu Jianhua --- configure | 13 +++++++--- libavutil/cpu.c | 1 + libavutil/cpu.h | 1 + libavutil/x86/cpu.c | 8 ++++-- libavutil/x86/cpu.h | 1 + libavutil/x86/x86inc.asm | 53 ++++++++++++++++++++------------------- tests/checkasm/checkasm.c | 35 +++++++++++++------------- 7 files changed, 63 insertions(+), 49 deletions(-) diff --git a/configure b/configure index 1535dc3c5b..d88c2ae979 100755 --- a/configure +++ b/configure @@ -444,6 +444,7 @@ Optimization options (experts only): --disable-fma4 disable FMA4 optimizations --disable-avx2 disable AVX2 optimizations --disable-avx512 disable AVX-512 optimizations + --disable-avx512icl disable AVX-512ICL optimizations --disable-aesni disable AESNI optimizations --disable-armv5te disable armv5te optimizations --disable-armv6 disable armv6 optimizations @@ -2098,6 +2099,7 @@ ARCH_EXT_LIST_X86_SIMD=" avx avx2 avx512 + avx512icl fma3 fma4 mmx @@ -2666,6 +2668,7 @@ fma3_deps="avx" fma4_deps="avx" avx2_deps="avx" avx512_deps="avx2" +avx512icl_deps="avx512" mmx_external_deps="x86asm" mmx_inline_deps="inline_asm x86" @@ -6128,10 +6131,11 @@ EOF elf*) enabled debug && append X86ASMFLAGS $x86asm_debug ;; esac - enabled avx512 && check_x86asm avx512_external "vmovdqa32 [eax]{k1}{z}, zmm0" - enabled avx2 && check_x86asm avx2_external "vextracti128 xmm0, ymm0, 0" - enabled xop && check_x86asm xop_external "vpmacsdd xmm0, xmm1, xmm2, xmm3" - enabled fma4 && check_x86asm fma4_external "vfmaddps ymm0, ymm1, ymm2, ymm3" + enabled avx512 && check_x86asm avx512_external "vmovdqa32 [eax]{k1}{z}, zmm0" + enabled avx512icl && check_x86asm avx512icl_external "vpdpwssds zmm31{k1}{z}, zmm29, zmm28" + enabled avx2 && check_x86asm avx2_external "vextracti128 xmm0, ymm0, 0" + enabled xop && check_x86asm xop_external "vpmacsdd xmm0, xmm1, xmm2, xmm3" + enabled fma4 && check_x86asm fma4_external "vfmaddps ymm0, ymm1, ymm2, ymm3" check_x86asm cpunop "CPU amdnop" fi @@ -7471,6 +7475,7 @@ if enabled x86; then echo "AVX enabled ${avx-no}" echo "AVX2 enabled ${avx2-no}" echo "AVX-512 enabled ${avx512-no}" + echo "AVX-512ICL enabled ${avx512icl-no}" echo "XOP enabled ${xop-no}" echo "FMA3 enabled ${fma3-no}" echo "FMA4 enabled ${fma4-no}" diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 1368502245..833c220192 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -137,6 +137,7 @@ int av_parse_cpu_caps(unsigned *flags, const char *s) { "cmov", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_CMOV }, .unit = "flags" }, { "aesni", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_AESNI }, .unit = "flags" }, { "avx512" , NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_AVX512 }, .unit = "flags" }, + { "avx512icl", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_AVX512ICL }, .unit = "flags" }, { "slowgather", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SLOW_GATHER }, .unit = "flags" }, #define CPU_FLAG_P2 AV_CPU_FLAG_CMOV | AV_CPU_FLAG_MMX diff --git a/libavutil/cpu.h b/libavutil/cpu.h index ce9bf14bf7..9711e574c5 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -54,6 +54,7 @@ #define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 #define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 #define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used +#define AV_CPU_FLAG_AVX512ICL 0x200000 ///< F/CD/BW/DQ/VL/VNNI/IFMA/VBMI/VBMI2/VPOPCNTDQ/BITALG/GFNI/VAES/VPCLMULQDQ #define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index 7b13fcae91..d6cd4fab9c 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -150,9 +150,13 @@ int ff_get_cpu_flags_x86(void) rval |= AV_CPU_FLAG_AVX2; #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ - if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) + if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) { rval |= AV_CPU_FLAG_AVX512; - +#if HAVE_AVX512ICL + if ((ebx & 0xd0200000) == 0xd0200000 && (ecx & 0x5f42) == 0x5f42) + rval |= AV_CPU_FLAG_AVX512ICL; +#endif /* HAVE_AVX512ICL */ + } } #endif /* HAVE_AVX512 */ #endif /* HAVE_AVX2 */ diff --git a/libavutil/x86/cpu.h b/libavutil/x86/cpu.h index 937c697fa0..40a1eef0ab 100644 --- a/libavutil/x86/cpu.h +++ b/libavutil/x86/cpu.h @@ -80,6 +80,7 @@ #define EXTERNAL_AVX2_SLOW(flags) CPUEXT_SUFFIX_SLOW2(flags, _EXTERNAL, AVX2, AVX) #define EXTERNAL_AESNI(flags) CPUEXT_SUFFIX(flags, _EXTERNAL, AESNI) #define EXTERNAL_AVX512(flags) CPUEXT_SUFFIX(flags, _EXTERNAL, AVX512) +#define EXTERNAL_AVX512ICL(flags) CPUEXT_SUFFIX(flags, _EXTERNAL, AVX512ICL) #define INLINE_AMD3DNOW(flags) CPUEXT_SUFFIX(flags, _INLINE, AMD3DNOW) #define INLINE_AMD3DNOWEXT(flags) CPUEXT_SUFFIX(flags, _INLINE, AMD3DNOWEXT) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 01c35e3a4b..251ee797de 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -817,32 +817,33 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, ; cpuflags -%assign cpuflags_mmx (1<<0) -%assign cpuflags_mmx2 (1<<1) | cpuflags_mmx -%assign cpuflags_3dnow (1<<2) | cpuflags_mmx -%assign cpuflags_3dnowext (1<<3) | cpuflags_3dnow -%assign cpuflags_sse (1<<4) | cpuflags_mmx2 -%assign cpuflags_sse2 (1<<5) | cpuflags_sse -%assign cpuflags_sse2slow (1<<6) | cpuflags_sse2 -%assign cpuflags_lzcnt (1<<7) | cpuflags_sse2 -%assign cpuflags_sse3 (1<<8) | cpuflags_sse2 -%assign cpuflags_ssse3 (1<<9) | cpuflags_sse3 -%assign cpuflags_sse4 (1<<10)| cpuflags_ssse3 -%assign cpuflags_sse42 (1<<11)| cpuflags_sse4 -%assign cpuflags_aesni (1<<12)| cpuflags_sse42 -%assign cpuflags_avx (1<<13)| cpuflags_sse42 -%assign cpuflags_xop (1<<14)| cpuflags_avx -%assign cpuflags_fma4 (1<<15)| cpuflags_avx -%assign cpuflags_fma3 (1<<16)| cpuflags_avx -%assign cpuflags_bmi1 (1<<17)| cpuflags_avx|cpuflags_lzcnt -%assign cpuflags_bmi2 (1<<18)| cpuflags_bmi1 -%assign cpuflags_avx2 (1<<19)| cpuflags_fma3|cpuflags_bmi2 -%assign cpuflags_avx512 (1<<20)| cpuflags_avx2 ; F, CD, BW, DQ, VL - -%assign cpuflags_cache32 (1<<21) -%assign cpuflags_cache64 (1<<22) -%assign cpuflags_aligned (1<<23) ; not a cpu feature, but a function variant -%assign cpuflags_atom (1<<24) +%assign cpuflags_mmx (1<<0) +%assign cpuflags_mmx2 (1<<1) | cpuflags_mmx +%assign cpuflags_3dnow (1<<2) | cpuflags_mmx +%assign cpuflags_3dnowext (1<<3) | cpuflags_3dnow +%assign cpuflags_sse (1<<4) | cpuflags_mmx2 +%assign cpuflags_sse2 (1<<5) | cpuflags_sse +%assign cpuflags_sse2slow (1<<6) | cpuflags_sse2 +%assign cpuflags_lzcnt (1<<7) | cpuflags_sse2 +%assign cpuflags_sse3 (1<<8) | cpuflags_sse2 +%assign cpuflags_ssse3 (1<<9) | cpuflags_sse3 +%assign cpuflags_sse4 (1<<10)| cpuflags_ssse3 +%assign cpuflags_sse42 (1<<11)| cpuflags_sse4 +%assign cpuflags_aesni (1<<12)| cpuflags_sse42 +%assign cpuflags_avx (1<<13)| cpuflags_sse42 +%assign cpuflags_xop (1<<14)| cpuflags_avx +%assign cpuflags_fma4 (1<<15)| cpuflags_avx +%assign cpuflags_fma3 (1<<16)| cpuflags_avx +%assign cpuflags_bmi1 (1<<17)| cpuflags_avx|cpuflags_lzcnt +%assign cpuflags_bmi2 (1<<18)| cpuflags_bmi1 +%assign cpuflags_avx2 (1<<19)| cpuflags_fma3|cpuflags_bmi2 +%assign cpuflags_avx512 (1<<20)| cpuflags_avx2 ; F, CD, BW, DQ, VL +%assign cpuflags_avx512icl (1<<25)| cpuflags_avx512 + +%assign cpuflags_cache32 (1<<21) +%assign cpuflags_cache64 (1<<22) +%assign cpuflags_aligned (1<<23) ; not a cpu feature, but a function variant +%assign cpuflags_atom (1<<24) ; Returns a boolean value expressing whether or not the specified cpuflag is enabled. %define cpuflag(x) (((((cpuflags & (cpuflags_ %+ x)) ^ (cpuflags_ %+ x)) - 1) >> 31) & 1) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index f74125e810..e77b4ec20f 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -220,23 +220,24 @@ static const struct { { "MMI", "mmi", AV_CPU_FLAG_MMI }, { "MSA", "msa", AV_CPU_FLAG_MSA }, #elif ARCH_X86 - { "MMX", "mmx", AV_CPU_FLAG_MMX|AV_CPU_FLAG_CMOV }, - { "MMXEXT", "mmxext", AV_CPU_FLAG_MMXEXT }, - { "3DNOW", "3dnow", AV_CPU_FLAG_3DNOW }, - { "3DNOWEXT", "3dnowext", AV_CPU_FLAG_3DNOWEXT }, - { "SSE", "sse", AV_CPU_FLAG_SSE }, - { "SSE2", "sse2", AV_CPU_FLAG_SSE2|AV_CPU_FLAG_SSE2SLOW }, - { "SSE3", "sse3", AV_CPU_FLAG_SSE3|AV_CPU_FLAG_SSE3SLOW }, - { "SSSE3", "ssse3", AV_CPU_FLAG_SSSE3|AV_CPU_FLAG_ATOM }, - { "SSE4.1", "sse4", AV_CPU_FLAG_SSE4 }, - { "SSE4.2", "sse42", AV_CPU_FLAG_SSE42 }, - { "AES-NI", "aesni", AV_CPU_FLAG_AESNI }, - { "AVX", "avx", AV_CPU_FLAG_AVX }, - { "XOP", "xop", AV_CPU_FLAG_XOP }, - { "FMA3", "fma3", AV_CPU_FLAG_FMA3 }, - { "FMA4", "fma4", AV_CPU_FLAG_FMA4 }, - { "AVX2", "avx2", AV_CPU_FLAG_AVX2 }, - { "AVX-512", "avx512", AV_CPU_FLAG_AVX512 }, + { "MMX", "mmx", AV_CPU_FLAG_MMX|AV_CPU_FLAG_CMOV }, + { "MMXEXT", "mmxext", AV_CPU_FLAG_MMXEXT }, + { "3DNOW", "3dnow", AV_CPU_FLAG_3DNOW }, + { "3DNOWEXT", "3dnowext", AV_CPU_FLAG_3DNOWEXT }, + { "SSE", "sse", AV_CPU_FLAG_SSE }, + { "SSE2", "sse2", AV_CPU_FLAG_SSE2|AV_CPU_FLAG_SSE2SLOW }, + { "SSE3", "sse3", AV_CPU_FLAG_SSE3|AV_CPU_FLAG_SSE3SLOW }, + { "SSSE3", "ssse3", AV_CPU_FLAG_SSSE3|AV_CPU_FLAG_ATOM }, + { "SSE4.1", "sse4", AV_CPU_FLAG_SSE4 }, + { "SSE4.2", "sse42", AV_CPU_FLAG_SSE42 }, + { "AES-NI", "aesni", AV_CPU_FLAG_AESNI }, + { "AVX", "avx", AV_CPU_FLAG_AVX }, + { "XOP", "xop", AV_CPU_FLAG_XOP }, + { "FMA3", "fma3", AV_CPU_FLAG_FMA3 }, + { "FMA4", "fma4", AV_CPU_FLAG_FMA4 }, + { "AVX2", "avx2", AV_CPU_FLAG_AVX2 }, + { "AVX-512", "avx512", AV_CPU_FLAG_AVX512 }, + { "AVX-512ICL", "avx512icl", AV_CPU_FLAG_AVX512ICL }, #elif ARCH_LOONGARCH { "LSX", "lsx", AV_CPU_FLAG_LSX }, { "LASX", "lasx", AV_CPU_FLAG_LASX }, From patchwork Wed Feb 23 08:57:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34466 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp632881nkx; Wed, 23 Feb 2022 00:58:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHBwU3wBgr/PEbV0PlrfO7NqhnfjOmFpxfl/0WQeC6O+h/mR8Bmit0iZmq9xVnaS0iF1RD X-Received: by 2002:a50:fb02:0:b0:40f:bd67:205f with SMTP id d2-20020a50fb02000000b0040fbd67205fmr30327069edq.409.1645606688440; Wed, 23 Feb 2022 00:58:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606688; cv=none; d=google.com; s=arc-20160816; b=bEHanxKVe1vgGYarjSe9EFB6wdl8CZX5mFXr/fqvfe0y6iMMLvMs1IenPVTbVzluIE kBd+LSBa/imfhMAt4/2GVMDgFOWg7acB1OKnuoh7NrqwoR1JKJztUS3ahjuPqo+j1yiS XUFoyTytw1twvKIy5LB77jhMwhB8enk1xvtfabgt1PBlGCOpTF3luE9fXBjqpv8iEXG+ fkkLtilgO7Y2/xSF48Px4NhuXXnVfqM0NA+asDIuzNv/wmoY6XrfJmMGGTgHLOE68pqE 3QHopisfWQ1l4NEa3qhwt8hLqFUEHpd3L+j0JyU/yGto+Xu814eseVm3tacRPJARzWek sFAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=hrcdGplRyNb+GhTLltTjz1ZAqYTyXMMtckRNElwVAec=; b=gAPBv889Xcc6XrKQZk50FW7LEfl7HNkxpgWnWUGDaCDRaVYBn1KqRduD800muiJi4w 2K3dWHFdIDxR8Gz2WvtlvhLTje0RqDmlDYYeXUhslGa4LQ8KvCN60SRM3xeEUHRdltmX DGF1tktp2vjuK1ceBIIxyKQLYpcvmVHj5qKm6sx4+4+FPNBzxHRPkOc3vujVwPDx4l3C Z4bVQkegkhsEQY5Fp3FlsUu+Rmc0ZA7RmywGFo3UoCZSOJupaCd6ghGKpU/w5iPmQ4p2 q+d/Djl4vU7h77epZqyopYkGmY/Z7okqMi1QjtzzqWSV/XpqmzwgiHEtZC2L+5FJcXbO RTWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=RUxX3jHL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c15si208324edf.453.2022.02.23.00.58.08; Wed, 23 Feb 2022 00:58:08 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=RUxX3jHL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 93DE468B290; Wed, 23 Feb 2022 10:57:57 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CA56868B287 for ; Wed, 23 Feb 2022 10:57:49 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606675; x=1677142675; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=RYLL7U7px9zkv8atr6ay7vC55mg02nJ/ZjqAEb1WAtE=; b=RUxX3jHLKix5Osxix9t1cH0YWlWAUZV/4OtAyvSM+YWdz1Ee4cHU6yQm sUbDr5O+COsTPHeROefqo0W65Tj10cdS/56IKFdxGtniyJef3r6T13t0Z fV6LU4QTohkk67mCtSaQpYGlN/x8ZnrDVZ1HD0KbN0UTyTzKxyY4IYDbr gmuB0VtZFMCq98TKBmb1BjCBmdtw5Cky21pw0wIMfO/J/Fftke/aNiLKK VKIDjAsOdh7bnayhhx9bihg2LI2HYK5T+3yXO3jfEi7vyPRxdX2yYiXvD 751vv/WjCLoa7G4uK3siUN2rJxCig6Iz7s5sMs6Z9ZeZ2fC+qIcqs+6aS g==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427227" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427227" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005791" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:45 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:31 +0800 Message-Id: <20220223085735.70854-2-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220223085735.70854-1-jianhua.wu@intel.com> References: <20220223085735.70854-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 2/6] avcodec/x86/hevc_mc: add qpel_h8_8_avx512icl and qpel_hv8_8_avx512icl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ddIebPSVi6Le From: Wu Jianhua This commit uses the instruction `vpdpbusd` introduced by AVX512 VNNI to calculate the horizontal filter. ff_hevc_put_hevc_qpel_h8_8_sse4 1039169 ff_hevc_put_hevc_qpel_h8_8_avx512icl 677153 ff_hevc_put_hevc_qpel_hv8_8_sse4 3603511 ff_hevc_put_hevc_qpel_hv8_8_avx512icl 2995354 Signed-off-by: Wu Jianhua --- libavcodec/x86/hevc_mc.asm | 139 ++++++++++++++++++++++++++++++++++ libavcodec/x86/hevcdsp.h | 3 + libavcodec/x86/hevcdsp_init.c | 4 + 3 files changed, 146 insertions(+) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index ff6ed0711a..026b6b48ee 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -87,6 +87,26 @@ QPEL_TABLE 12, 4, w, sse4 QPEL_TABLE 8,16, b, avx2 QPEL_TABLE 10, 8, w, avx2 +QPEL_TABLE 8, 1, b, avx512icl_h +QPEL_TABLE 8, 1, d, avx512icl_v + +pb_qpel_shuffle_index: db 0, 1, 2, 3 + db 1, 2, 3, 4 + db 2, 3, 4, 5 + db 3, 4, 5, 6 + db 4, 5, 6, 7 + db 5, 6, 7, 8 + db 6, 7, 8, 9 + db 7, 8, 9, 10 + db 4, 5, 6, 7 + db 5, 6, 7, 8 + db 6, 7, 8, 9 + db 7, 8, 9, 10 + db 8, 9, 10, 11 + db 9, 10, 11, 12 + db 10, 11, 12, 13 + db 11, 12, 13, 14 + SECTION .text %define MAX_PB_SIZE 64 @@ -1670,3 +1690,122 @@ HEVC_PUT_HEVC_QPEL_HV 16, 10 %endif ;AVX2 %endif ; ARCH_X86_64 + +%macro QPEL_FILTER_H 5 +%define %%table hevc_qpel_filters_avx512icl_h_%1 +%assign %%offset 4 + dec %2q + shl %2q, 3 +%ifdef PIC + lea %5q, [%%table] + %define FILTER %5q +%else + %define FILTER %%table +%endif + vpbroadcastd m%3, [FILTER + %2q + 0*%%offset] + vpbroadcastd m%4, [FILTER + %2q + 1*%%offset] +%endmacro + +%macro QPEL_FILTER_V 5 + vpbroadcastd m%3, [%5 + %2q + 4*%4] +%endmacro + +%macro QPEL_LOAD_SHUF 2 + movu m%1, [pb_qpel_shuffle_index + 0] + movu m%2, [pb_qpel_shuffle_index + 32] +%endmacro + +; required: m0-m5 +; %1: dst register index +; %2: name for src +%macro QPEL_H_LOAD_COMPUTE 2 + pxor m%1, m%1 + movu xm4, [%2q - 3] + vpermb m5, m2, m4 + vpermb m4, m3, m4 + vpdpbusd m%1, m5, m0 + vpdpbusd m%1, m4, m1 +%endmacro + +%macro HEVC_PUT_HEVC_QPEL_AVX512ICL 2 +cglobal hevc_put_hevc_qpel_h%1_%2, 5, 6, 8, dst, src, srcstride, height, mx, tmp + QPEL_FILTER_H %1, mx, 0, 1, tmp + QPEL_LOAD_SHUF 2, 3 +.loop: + QPEL_H_LOAD_COMPUTE 6, src + vpmovdw xm6, m6 + movu [dstq], xm6 + LOOP_END dst, src, srcstride + RET +%endmacro + +%macro HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 2 +cglobal hevc_put_hevc_qpel_hv%1_%2, 6, 7, 8, dst, src, srcstride, height, mx, my, tmp +%assign %%shift 6 +%assign %%extra 7 + QPEL_FILTER_H %1, mx, 0, 1, tmp + QPEL_LOAD_SHUF 2, 3 + lea tmpq, [srcstrideq*3] + sub srcq, tmpq + sub myq, 1 + shl myq, 5 +%ifdef PIC +%define %%table hevc_qpel_filters_avx512icl_v_%1 + lea tmpq, [%%table] + %define FILTER tmpq +%else + %define FILTER %%table +%endif +%assign %%i 6 +%assign %%j 0 +%rep %1 + QPEL_FILTER_V %1, my, %%i, %%j, FILTER + %assign %%i %%i+1 + %assign %%j %%j+1 +%endrep +%rep %%extra + QPEL_H_LOAD_COMPUTE %%i, src + add srcq, srcstrideq +%assign %%i %%i+1 +%endrep +.loop: + QPEL_H_LOAD_COMPUTE %%i, src + vpmulld m22, m14, m6 + vpmulld m23, m15, m7 + vpmulld m24, m16, m8 + vpmulld m25, m17, m9 + vpaddd m26, m22, m23 + vpaddd m24, m25 + vpaddd m26, m24 + vpmulld m22, m18, m10 + vpmulld m23, m19, m11 + vpmulld m24, m20, m12 + vpmulld m25, m21, m13 + vpaddd m22, m22, m23 + vpaddd m24, m25 + vpaddd m26, m24 + vpaddd m22, m26 + mova m14, m15 + mova m15, m16 + mova m16, m17 + mova m17, m18 + mova m18, m19 + mova m19, m20 + mova m20, m21 + vpsrad m22, %%shift + vpmovdw xm22, m22 + movu [dstq], xm22 + LOOP_END dst, src, srcstride + + RET +%endmacro + +%if ARCH_X86_64 +%if HAVE_AVX512ICL_EXTERNAL + +INIT_YMM avx512icl +HEVC_PUT_HEVC_QPEL_AVX512ICL 8, 8 +HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 8, 8 + +%endif +%endif diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 67be0a9059..5a495d2563 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -233,6 +233,9 @@ WEIGHTING_PROTOTYPES(8, sse4); WEIGHTING_PROTOTYPES(10, sse4); WEIGHTING_PROTOTYPES(12, sse4); +void ff_hevc_put_hevc_qpel_h8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_hv8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); + /////////////////////////////////////////////////////////////////////////////// // TRANSFORM_ADD /////////////////////////////////////////////////////////////////////////////// diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index 8a3fa2744b..0341835944 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -878,6 +878,10 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->add_residual[3] = ff_hevc_add_residual_32_8_avx2; } + if (EXTERNAL_AVX512ICL(cpu_flags)) { + c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_avx512icl; + c->put_hevc_qpel[3][1][1] = ff_hevc_put_hevc_qpel_hv8_8_avx512icl; + } } else if (bit_depth == 10) { if (EXTERNAL_MMXEXT(cpu_flags)) { c->add_residual[0] = ff_hevc_add_residual_4_10_mmxext; From patchwork Wed Feb 23 08:57:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34467 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp632982nkx; Wed, 23 Feb 2022 00:58:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJxAn4aSp+eP9/RlIq4RzhV18LdqnRA+jexDTd7+NaRae87mRI8m90k3+5zvhtNCHJYkWsMr X-Received: by 2002:a50:934b:0:b0:410:befb:cfd0 with SMTP id n11-20020a50934b000000b00410befbcfd0mr29850762eda.27.1645606699203; Wed, 23 Feb 2022 00:58:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606699; cv=none; d=google.com; s=arc-20160816; b=v5tJtd8Pn/8+XmQivMCVGm9E7l9f1PqnlM283G+JYTBpmKCPvt8Sb4qv5ef6pKlvvI 54UXHLyIj9XLzfJ/5gQOq+n9Q0v7s16CFYoDoKCG6xhgMR3cazyYMQNtle3dWbeA/ngJ vh9DkpPogP2tafmkMFqXkU7QNowjtexpYFnkviwXvxowu5lEj2ftrHqd1ctG5/6czMN9 CIylvPPBDFAaxX1ZDXrKG7ty81OCl8m10LR+4pkpWTokYeruETXSCFbmSMvuV4QAIA15 WCcfn5alebueVoix5p9yOYUUl8GcO8dDin+eNgn/oALrqgVhCQiaj218mzLeHGP+3bhb 66xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=9TW38HCORQBdX1sEM6W/Nktdt0aboEBq1qod6o3H1IM=; b=09GpsAW7jr8Hsy4P42tHJH8S6EpOgANFu8rRP91XzYKSHp59zX0Eg4a0IG7b9/z2ik jlisekyNM+NRPlcQao+n+pjv/1QJ3iE1ttnLTCG+DHCL2M3onlavrIQg3peiMtz0VYhG MdooXkEnK+2YU9qBO+hfMxIrdAlJQ78+bPqYk8dovl/+LmsLuwxHffRv7DgCqkfwuU3A DVAOJDzGaSElL2fceQI+z0iPubLa3UW+97FlbU3w/ogvLTQGwo3tlyPgDW4CkHJ1m9ym zyCy/hr8A7yQ7HZxrnT3AWHGV2ZZks5HdF2yDCMI6Llb1T+9RsSwwjlx7A4VNMKsfjrc pLNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=lsYEOUXf; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c7si13465000ejd.257.2022.02.23.00.58.18; Wed, 23 Feb 2022 00:58:19 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=lsYEOUXf; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9C38168B297; Wed, 23 Feb 2022 10:57:58 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BCA5468B287 for ; Wed, 23 Feb 2022 10:57:50 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606676; x=1677142676; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Zq6E5AuwXTSWX6Mk+7odHID3aC/zjXwTDRloAZm1P08=; b=lsYEOUXfI/V/kMMfjN/5ZgJdyzT54IkYvjnJFgQK6wyy+PKbsBcVvsva eaEXSDZW5Wm43LYc1kMITegFEGo2+Mv7a2I/uGRrjf208kjzvMeW8a8+H FlMjgR9yOH/9JVRanmP6rnAL3IzDOF402Z4oIdHoLNbJe1hA/EWTeSbJd M6B7ZZL4WRmLf90Hj2X9LpCwh5mFyK5VyaPsvHXRCnBhcnFpNQS0rfeFn UjDnPwwoJPP6umoWPAjUn1x0+Dzz3zkb9GuSQpKqF5iAgefoM1QvleHyV V1tpTCv+PRINiI0DjX4pRNqdHBJ8GTtIOdbh0ErZkgyW7hKAJyh3h/1Ao A==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427230" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427230" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005794" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:46 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:32 +0800 Message-Id: <20220223085735.70854-3-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220223085735.70854-1-jianhua.wu@intel.com> References: <20220223085735.70854-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 3/6] avcodec/x86/hevc_mc: add qpel_h16_8_avx512icl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: AfTsKTe5wxJT From: Wu Jianhua ff_hevc_put_hevc_qpel_h16_8_sse4 3290870 ff_hevc_put_hevc_qpel_h16_8_avx512icl 1730033 Signed-off-by: Wu Jianhua --- libavcodec/x86/hevc_mc.asm | 39 ++++++++++++++++++++++++++++++----- libavcodec/x86/hevcdsp.h | 1 + libavcodec/x86/hevcdsp_init.c | 1 + 3 files changed, 36 insertions(+), 5 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 026b6b48ee..8c128f5202 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -89,6 +89,7 @@ QPEL_TABLE 10, 8, w, avx2 QPEL_TABLE 8, 1, b, avx512icl_h QPEL_TABLE 8, 1, d, avx512icl_v +QPEL_TABLE 16, 1, b, avx512icl_h pb_qpel_shuffle_index: db 0, 1, 2, 3 db 1, 2, 3, 4 @@ -98,6 +99,14 @@ pb_qpel_shuffle_index: db 0, 1, 2, 3 db 5, 6, 7, 8 db 6, 7, 8, 9 db 7, 8, 9, 10 + db 8, 9, 10, 11 + db 9, 10, 11, 12 + db 10, 11, 12, 13 + db 11, 12, 13, 14 + db 12, 13, 14, 15 + db 13, 14, 15, 16 + db 14, 15, 16, 17 + db 15, 16, 17, 18 db 4, 5, 6, 7 db 5, 6, 7, 8 db 6, 7, 8, 9 @@ -106,6 +115,14 @@ pb_qpel_shuffle_index: db 0, 1, 2, 3 db 9, 10, 11, 12 db 10, 11, 12, 13 db 11, 12, 13, 14 + db 12, 13, 14, 15 + db 13, 14, 15, 16 + db 14, 15, 16, 17 + db 15, 16, 17, 18 + db 16, 17, 18, 19 + db 17, 18, 19, 20 + db 18, 19, 20, 21 + db 19, 20, 21, 22 SECTION .text @@ -1712,7 +1729,7 @@ HEVC_PUT_HEVC_QPEL_HV 16, 10 %macro QPEL_LOAD_SHUF 2 movu m%1, [pb_qpel_shuffle_index + 0] - movu m%2, [pb_qpel_shuffle_index + 32] + movu m%2, [pb_qpel_shuffle_index + 64] %endmacro ; required: m0-m5 @@ -1720,7 +1737,11 @@ HEVC_PUT_HEVC_QPEL_HV 16, 10 ; %2: name for src %macro QPEL_H_LOAD_COMPUTE 2 pxor m%1, m%1 - movu xm4, [%2q - 3] +%if mmsize == 64 + movu ym4, [%2] +%else + movu xm4, [%2] +%endif vpermb m5, m2, m4 vpermb m4, m3, m4 vpdpbusd m%1, m5, m0 @@ -1732,9 +1753,14 @@ cglobal hevc_put_hevc_qpel_h%1_%2, 5, 6, 8, dst, src, srcstride, height, mx, tmp QPEL_FILTER_H %1, mx, 0, 1, tmp QPEL_LOAD_SHUF 2, 3 .loop: - QPEL_H_LOAD_COMPUTE 6, src + QPEL_H_LOAD_COMPUTE 6, srcq - 3 +%if %1 == 8 vpmovdw xm6, m6 movu [dstq], xm6 +%else + vpmovdw ym6, m6 + movu [dstq], ym6 +%endif LOOP_END dst, src, srcstride RET %endmacro @@ -1764,12 +1790,12 @@ cglobal hevc_put_hevc_qpel_hv%1_%2, 6, 7, 8, dst, src, srcstride, height, mx, my %assign %%j %%j+1 %endrep %rep %%extra - QPEL_H_LOAD_COMPUTE %%i, src + QPEL_H_LOAD_COMPUTE %%i, srcq - 3 add srcq, srcstrideq %assign %%i %%i+1 %endrep .loop: - QPEL_H_LOAD_COMPUTE %%i, src + QPEL_H_LOAD_COMPUTE %%i, srcq - 3 vpmulld m22, m14, m6 vpmulld m23, m15, m7 vpmulld m24, m16, m8 @@ -1807,5 +1833,8 @@ INIT_YMM avx512icl HEVC_PUT_HEVC_QPEL_AVX512ICL 8, 8 HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 8, 8 +INIT_ZMM avx512icl +HEVC_PUT_HEVC_QPEL_AVX512ICL 16, 8 + %endif %endif diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 5a495d2563..6e3fc01ad0 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -234,6 +234,7 @@ WEIGHTING_PROTOTYPES(10, sse4); WEIGHTING_PROTOTYPES(12, sse4); void ff_hevc_put_hevc_qpel_h8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_h16_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_hv8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); /////////////////////////////////////////////////////////////////////////////// diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index 0341835944..4023faa654 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -880,6 +880,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) } if (EXTERNAL_AVX512ICL(cpu_flags)) { c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_avx512icl; + c->put_hevc_qpel[5][0][1] = ff_hevc_put_hevc_qpel_h16_8_avx512icl; c->put_hevc_qpel[3][1][1] = ff_hevc_put_hevc_qpel_hv8_8_avx512icl; } } else if (bit_depth == 10) { From patchwork Wed Feb 23 08:57:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34468 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp633115nkx; Wed, 23 Feb 2022 00:58:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyIPFa4t2DfvZKk1UKJX7Unb2n9pOEfkt4zg7MK8Z/HG5mfMVaedkLBWgCHg70W9GdjQlyQ X-Received: by 2002:aa7:c986:0:b0:413:30cc:cfd8 with SMTP id c6-20020aa7c986000000b0041330cccfd8mr2458065edt.122.1645606710904; Wed, 23 Feb 2022 00:58:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606710; cv=none; d=google.com; s=arc-20160816; b=cf9Nzbfx/+vByrkYiwFEHznjvyF0evR4bFXUTZ829svaKLjgLQcXg8i5IG8Vqkggo8 HYc+Qva/YUdx9sB18uHDPzBAszrXXXSJ9daKH1IH6wEqkwBFkNv9FNGYOiOHf7HXcxlY fmXfmlwooOewloosN2GcdKRG7KdVNX9BvxHo64PLJPOypzaIW/scYCRTGdQ/IR/Mmt03 2m0KhGnqQzpSNT8zNGfhDmBXDO4Yeo8WpOqiypJGd9jIQ2BHgFjX8ItsnJ2ZErLN4v9o cJ77CvtelnoaCbMJsLh6I48CSKamI+6ht7FBWTO6CD5pXIjlM8/OaRurs5ZW9pxyt+Dk C0aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=FYveWN4FbrYKGplI13DSK8CSw1ryBiqzmqmtC5XAh+c=; b=LPtGB53ZdR4NbwRrUCUiAoDxB/gNhcPUx7mS3P2m4zgR09CcBsu+x4j2y995iu2fDQ mia/iGjPBDMkDP76ZRoJGrUcSJCMWkFZqQtkF+o+a+rsEq6g1sJ+HEbbdsnIRUPt/w70 NXCPDphomg+w8J9r2VfbHppVb8Pp8+GZ+9ja2AWXosePE6BqaTNvDmaj844mUN7FBhZ4 SA++wWCNcjr4XM8JhrDa6VUXhZBn7moDp/UnvF/QxHMpy1xhzjT0RxTj5K2E8zun1I7D vN3xW+CQ+cEpaTuX+xcZW+kwJfcAMoP1evCS06GB6bi72su3Vcocvm6pz0EUde5haZWO t/tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=BCujw7xQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id t15si436796edc.81.2022.02.23.00.58.30; Wed, 23 Feb 2022 00:58:30 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=BCujw7xQ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9125268B27D; Wed, 23 Feb 2022 10:58:00 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 71B6068B28E for ; Wed, 23 Feb 2022 10:57:53 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606678; x=1677142678; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=I0zjEoblg0E5m5+R3XOvJraak26/fFX+i58L1N6GTTo=; b=BCujw7xQcbOCh86cV+OISHcW5fUJ0Hq4b8ud5DIS8N1CNI5f6SjzybdO wKdX74hjo71AXIWlenjAqFKaTnz95nNCueZo3spnm9lKWI9EHFGAv3tg6 V/ag30/nRE53TMaSbdKYX+HHjX2VEhpH8IDIbBrC89byZbRrswTaY38cA 07FwS5dHzWylTKdUg+fViB0KWM4YCy0ERzNst7iHt0h6BlZmtAaFymHNT p0Kg3kEgxN8vMsvoi1gtXkVLBKFyrbFlT+iUHV22jXRNwQb6AZHxn/baw /VseWebs10jymtCe2uROB3tSqUNomax/Gd+2FBx+Zkc3nbT4r+jQBuuqM w==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427232" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427232" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005797" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:47 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:33 +0800 Message-Id: <20220223085735.70854-4-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220223085735.70854-1-jianhua.wu@intel.com> References: <20220223085735.70854-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 4/6] avcodec/x86/hevc_mc: add qpel_h4_8_avx512icl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: WSj/WaVaP1HC From: Wu Jianhua ff_hevc_put_hevc_qpel_h4_8_sse4 993694 ff_hevc_put_hevc_qpel_h4_8_avx512icl 686647 Signed-off-by: Wu Jianhua --- libavcodec/x86/hevc_mc.asm | 12 ++++++++++-- libavcodec/x86/hevcdsp.h | 1 + libavcodec/x86/hevcdsp_init.c | 1 + 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 8c128f5202..25880b8858 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -87,6 +87,7 @@ QPEL_TABLE 12, 4, w, sse4 QPEL_TABLE 8,16, b, avx2 QPEL_TABLE 10, 8, w, avx2 +QPEL_TABLE 4, 1, b, avx512icl_h QPEL_TABLE 8, 1, b, avx512icl_h QPEL_TABLE 8, 1, d, avx512icl_v QPEL_TABLE 16, 1, b, avx512icl_h @@ -1734,7 +1735,7 @@ HEVC_PUT_HEVC_QPEL_HV 16, 10 ; required: m0-m5 ; %1: dst register index -; %2: name for src +; %2: src %macro QPEL_H_LOAD_COMPUTE 2 pxor m%1, m%1 %if mmsize == 64 @@ -1754,9 +1755,13 @@ cglobal hevc_put_hevc_qpel_h%1_%2, 5, 6, 8, dst, src, srcstride, height, mx, tmp QPEL_LOAD_SHUF 2, 3 .loop: QPEL_H_LOAD_COMPUTE 6, srcq - 3 -%if %1 == 8 +%if %1 < 16 vpmovdw xm6, m6 +%if %1 == 4 + movq [dstq], xm6 +%else movu [dstq], xm6 +%endif %else vpmovdw ym6, m6 movu [dstq], ym6 @@ -1829,6 +1834,9 @@ cglobal hevc_put_hevc_qpel_hv%1_%2, 6, 7, 8, dst, src, srcstride, height, mx, my %if ARCH_X86_64 %if HAVE_AVX512ICL_EXTERNAL +INIT_XMM avx512icl +HEVC_PUT_HEVC_QPEL_AVX512ICL 4, 8 + INIT_YMM avx512icl HEVC_PUT_HEVC_QPEL_AVX512ICL 8, 8 HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 8, 8 diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 6e3fc01ad0..51ffdc9628 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -233,6 +233,7 @@ WEIGHTING_PROTOTYPES(8, sse4); WEIGHTING_PROTOTYPES(10, sse4); WEIGHTING_PROTOTYPES(12, sse4); +void ff_hevc_put_hevc_qpel_h4_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h16_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_hv8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index 4023faa654..be1484d06e 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -879,6 +879,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->add_residual[3] = ff_hevc_add_residual_32_8_avx2; } if (EXTERNAL_AVX512ICL(cpu_flags)) { + c->put_hevc_qpel[1][0][1] = ff_hevc_put_hevc_qpel_h4_8_avx512icl; c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_avx512icl; c->put_hevc_qpel[5][0][1] = ff_hevc_put_hevc_qpel_h16_8_avx512icl; c->put_hevc_qpel[3][1][1] = ff_hevc_put_hevc_qpel_hv8_8_avx512icl; From patchwork Wed Feb 23 08:57:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34469 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp633213nkx; Wed, 23 Feb 2022 00:58:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJzjaIbZunlatadY5/1sDDKwxNVFniend7KSVwJyqIlsjkaDzS9Ctoz3Q4/l/tnIg39Qz253 X-Received: by 2002:a17:906:354f:b0:6b4:1449:2d03 with SMTP id s15-20020a170906354f00b006b414492d03mr23201749eja.197.1645606721623; Wed, 23 Feb 2022 00:58:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606721; cv=none; d=google.com; s=arc-20160816; b=usCbFGDNXkTBtpKGGrStktvBy8Jur1Wdtr5FBzKKrI09IfKUTCHx88MJD1vegFcQhm RPZe7fGGlyRiB6AtPFjM7z99hCbIHZxBtKCvqr2jnBPSNtRAYjm2YY8sDpIMyc6y/WeW JY2cgGP1vOwAfdY80uIA3rTpHu3J0koiW8eWDqIrotPtur0//sLro0DlBWl3dDleoHxu hMV+zLdY6Lb28IqggtjC+gk0QiIBZk+gvkHcbouE15tTwd01lJKPM3OBzxLBFRajpyP6 ViJygckMlRyvAGuTUKep9rHdhaEDefTIL8Wibu0QDpUnBk/iadC6tYknnUjk3aW9BpIG Z01w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=9JY1fQtxQ6kIMrzpwvqIDKHIiS/P3bjv1gb49OCT6Fs=; b=VDwTEpXAMhhlIvijoOAEHtXRTAROmZ6w3HPSuv4BawTBAV3T3ulNsJYGnHQOJrM08Q o1WLOfDMtqRtkewn0jMMOWHk5MCShpZRFQ0+o9RJmf9WaaYd/+TAJSHEca9JTpAE7i8y +PFLEn8JcPwfqQPUKy8Qtcx+XebxT/SY3t1gZ0WgVGvSlOh686UV0sDHB671uA7a8Due p5WaL0XujUA7MKz3C34lXj8g8Nv1tozwl04G5fhdtKh4uxouvRQTD2tPKjcMxmg14m++ o3V/gy4pWCiI0fp6wUXgmMwNUquFAujIYdfCU+RXoxaNbOOOfY+W54Q2cyoeYEbg1kI/ T2gA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=esCShk+u; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id j20si5683591ejc.589.2022.02.23.00.58.41; Wed, 23 Feb 2022 00:58:41 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=esCShk+u; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9C96068B2A8; Wed, 23 Feb 2022 10:58:02 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B7C6268B299 for ; Wed, 23 Feb 2022 10:57:55 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606681; x=1677142681; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vD2G+o/sXOO9UppY5r2QrQw6vcDToneOKELj8Fb7E38=; b=esCShk+uZ5tNYN0Mp8xn0/IVlQUlJ/jEDxv2Kz/TF+qiFZdYizzCOiWC dAKLFShszcpM+/ZNESVr3Gfyhmxi2JjqDiuBd2VBaJZGNNq+A3F6FvI2A kP8J3NMJUQgbQILuur2l9mwMdeT1fPf0AWIV93A4gC3bNDKPU+tN9/ktn 3U52xXSA42kzw4XDh7YbFt9zV+gfGak3WcDkTsoCqnqTnS7w19fSaXmBm eJqNn7r4agY+MGhgzpQbwgyCalTNkN5i4lsuYZR9Nt3feGRZeHlqg1+rx 1+QkH9uUMiiNCqWf29UiL6jdR5P4nTfwd5yXvshghTAcoAlCi3ez5/5yj w==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427237" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427237" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005802" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:48 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:34 +0800 Message-Id: <20220223085735.70854-5-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220223085735.70854-1-jianhua.wu@intel.com> References: <20220223085735.70854-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 5/6] avcodec/x86/hevc_mc: add qpel_h32_8_avx512icl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: qkjEm4lLSRx5 From: Wu Jianhua ff_hevc_put_hevc_qpel_h32_8_sse4 14122151 ff_hevc_put_hevc_qpel_h32_8_avx2 9337675 ff_hevc_put_hevc_qpel_h32_8_avx512icl 6424654 Signed-off-by: Wu Jianhua --- libavcodec/x86/hevc_mc.asm | 7 +++++++ libavcodec/x86/hevcdsp.h | 1 + libavcodec/x86/hevcdsp_init.c | 1 + 3 files changed, 9 insertions(+) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 25880b8858..4cf5dcd338 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -91,6 +91,7 @@ QPEL_TABLE 4, 1, b, avx512icl_h QPEL_TABLE 8, 1, b, avx512icl_h QPEL_TABLE 8, 1, d, avx512icl_v QPEL_TABLE 16, 1, b, avx512icl_h +QPEL_TABLE 32, 1, b, avx512icl_h pb_qpel_shuffle_index: db 0, 1, 2, 3 db 1, 2, 3, 4 @@ -1765,6 +1766,11 @@ cglobal hevc_put_hevc_qpel_h%1_%2, 5, 6, 8, dst, src, srcstride, height, mx, tmp %else vpmovdw ym6, m6 movu [dstq], ym6 +%endif +%if %1 == 32 + QPEL_H_LOAD_COMPUTE 7, srcq + 16 - 3 + vpmovdw ym7, m7 + movu [dstq + 32], ym7 %endif LOOP_END dst, src, srcstride RET @@ -1843,6 +1849,7 @@ HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 8, 8 INIT_ZMM avx512icl HEVC_PUT_HEVC_QPEL_AVX512ICL 16, 8 +HEVC_PUT_HEVC_QPEL_AVX512ICL 32, 8 %endif %endif diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 51ffdc9628..8d3c3cc75f 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -236,6 +236,7 @@ WEIGHTING_PROTOTYPES(12, sse4); void ff_hevc_put_hevc_qpel_h4_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h16_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_h32_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_hv8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); /////////////////////////////////////////////////////////////////////////////// diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index be1484d06e..e9002c8b15 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -882,6 +882,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->put_hevc_qpel[1][0][1] = ff_hevc_put_hevc_qpel_h4_8_avx512icl; c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_avx512icl; c->put_hevc_qpel[5][0][1] = ff_hevc_put_hevc_qpel_h16_8_avx512icl; + c->put_hevc_qpel[7][0][1] = ff_hevc_put_hevc_qpel_h32_8_avx512icl; c->put_hevc_qpel[3][1][1] = ff_hevc_put_hevc_qpel_hv8_8_avx512icl; } } else if (bit_depth == 10) { From patchwork Wed Feb 23 08:57:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wu, Jianhua" X-Patchwork-Id: 34470 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6838:d078:0:0:0:0 with SMTP id x24csp633276nkx; Wed, 23 Feb 2022 00:58:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJwiqvTRu3pPWAFuZQOhljMntoECJTX9sC+BwJfZ2VumURVX+zRbnAS58B4oZ/WA8QLUyNYT X-Received: by 2002:a05:6402:42c9:b0:407:f86c:44e7 with SMTP id i9-20020a05640242c900b00407f86c44e7mr30451721edc.230.1645606732100; Wed, 23 Feb 2022 00:58:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645606732; cv=none; d=google.com; s=arc-20160816; b=DPxiU/NmapgUVUxtFSK0wZxLU3abevynU5zR3yBi5EIAVUrNps1LHFRxU8LG1ufpx2 6DNR4pU//48ftSpe1R9IQnOyLQ6Xwe9naWC+LxDM4N8JUS8tGZi5c2lr9qFRnIP+0ZX5 wYWxB1Z2S4bh1G6Pi/xGIYq61S1bJ8jK0SHEzaLynxjF88iM4BLjKtzglTzO3VUeDIxa k9CNwDjnHD3AoUvD2YrHF44+rEPBMUNSHvgxvfz14cG9BFHXAh7HxX/748Mxj5YNUQEr mXcHBUvWaTwlIkMqVdzZxE2E1j8iRVoU1XWyrRt3mUJCRIUD7H1o41kq+CZ6D/YEeWZO hGng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:mime-version:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:references:in-reply-to:message-id:date :to:from:dkim-signature:delivered-to; bh=Qby49HtiCOnMvIuwvh7yM9UDb8bxDo1KlFG2R0J2NC4=; b=UgDXGK9zPR7OAirNQ2NoDM0rcR+994BuSEOC1JOXCbdbAEIOfCmtP1tCRFBMmRruE8 dcgt+BBcRuKAEcL13HGSYGWPdMdnxnu1bay5GH4ECuWlMCd+zS1LXCMbv73ekpHHnOnF UhvZxjNREbQpketZBMO4yM7LsXWynRm2KasgEqrohOLdlvatxuK/g9AWpCcjwGqT6NxX OCOFVSRyT9QWyrr3JDiJO3h+ELun08bowEI6qd0U3tIdLMh/2XMm7qn8cqhbjyJOU+Gj pBw7zGYsdoSYJkmidA88DThybPCTpcxJBDWfhQqsA2nHokXurhKFZN4Mj+Q9EDo8akGW U3Zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=akLea7Yh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a19si13820518eds.21.2022.02.23.00.58.51; Wed, 23 Feb 2022 00:58:52 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=akLea7Yh; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A829C68B29A; Wed, 23 Feb 2022 10:58:05 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A7F9768B28F for ; Wed, 23 Feb 2022 10:57:56 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645606681; x=1677142681; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=23GO5sEwfNWG4fXxkqaTgA/iMW8uadz9CWkEKKOBMjk=; b=akLea7YhtRuoThuhNrYu1AwIC0/7N+SlYDWI5uwQXMN1cE7k+QzOdrYi ROLxw7JIgVWxyyCV5i31LuKSm+V/PWpQqPDKH+TJYikW19pCVx+JFNLmh YA9kIfrBxG9XFXg8yhj7bqVPGMa9+I8Cz1JarnV0xd+EEyqgeCFt4tKUr X9yCgL6Dj5BZrE9W9LSlj/ZifntyuOtg4fDkrBKQOVgIwbaHBpj1El5Et i/CYG4WfkKgs/G4aG+ZrN/eOqesj3GkS1gCzG+3cGLXDhV0DtWl6ZnWZA wstCehV2ddw3VPDRbWzkvLFaR1lsuyNJMAE3vcKyF4t7FIGw2DlezRF1x Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="235427245" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="235427245" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 00:57:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="628005806" Received: from otc-skl-e5-server.sh.intel.com ([10.239.43.106]) by FMSMGA003.fm.intel.com with ESMTP; 23 Feb 2022 00:57:50 -0800 From: jianhua.wu-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 23 Feb 2022 16:57:35 +0800 Message-Id: <20220223085735.70854-6-jianhua.wu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220223085735.70854-1-jianhua.wu@intel.com> References: <20220223085735.70854-1-jianhua.wu@intel.com> Subject: [FFmpeg-devel] [PATCH 6/6] avcodec/x86/hevc_mc: add qpel_h64_8_avx512icl X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Wu Jianhua MIME-Version: 1.0 Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: luB2lf3J0I8x From: Wu Jianhua ff_hevc_put_hevc_qpel_h64_8_sse4 56782981 ff_hevc_put_hevc_qpel_h64_8_avx2 40097816 ff_hevc_put_hevc_qpel_h64_8_avx512icl 25488576 Signed-off-by: Wu Jianhua --- libavcodec/x86/hevc_mc.asm | 12 +++++++++++- libavcodec/x86/hevcdsp.h | 1 + libavcodec/x86/hevcdsp_init.c | 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 4cf5dcd338..37264962af 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -92,6 +92,7 @@ QPEL_TABLE 8, 1, b, avx512icl_h QPEL_TABLE 8, 1, d, avx512icl_v QPEL_TABLE 16, 1, b, avx512icl_h QPEL_TABLE 32, 1, b, avx512icl_h +QPEL_TABLE 64, 1, b, avx512icl_h pb_qpel_shuffle_index: db 0, 1, 2, 3 db 1, 2, 3, 4 @@ -1767,10 +1768,18 @@ cglobal hevc_put_hevc_qpel_h%1_%2, 5, 6, 8, dst, src, srcstride, height, mx, tmp vpmovdw ym6, m6 movu [dstq], ym6 %endif -%if %1 == 32 +%if %1 > 16 QPEL_H_LOAD_COMPUTE 7, srcq + 16 - 3 vpmovdw ym7, m7 movu [dstq + 32], ym7 +%endif +%if %1 > 32 + QPEL_H_LOAD_COMPUTE 6, srcq + 32 - 3 + QPEL_H_LOAD_COMPUTE 7, srcq + 48 - 3 + vpmovdw ym6, m6 + vpmovdw ym7, m7 + movu [dstq + 64], ym6 + movu [dstq + 96], ym7 %endif LOOP_END dst, src, srcstride RET @@ -1850,6 +1859,7 @@ HEVC_PUT_HEVC_QPEL_HV_AVX512ICL 8, 8 INIT_ZMM avx512icl HEVC_PUT_HEVC_QPEL_AVX512ICL 16, 8 HEVC_PUT_HEVC_QPEL_AVX512ICL 32, 8 +HEVC_PUT_HEVC_QPEL_AVX512ICL 64, 8 %endif %endif diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 8d3c3cc75f..24e35bc032 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -237,6 +237,7 @@ void ff_hevc_put_hevc_qpel_h4_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t void ff_hevc_put_hevc_qpel_h8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h16_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h32_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_h64_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_hv8_8_avx512icl(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); /////////////////////////////////////////////////////////////////////////////// diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index e9002c8b15..64fa5bc1f8 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -883,6 +883,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_avx512icl; c->put_hevc_qpel[5][0][1] = ff_hevc_put_hevc_qpel_h16_8_avx512icl; c->put_hevc_qpel[7][0][1] = ff_hevc_put_hevc_qpel_h32_8_avx512icl; + c->put_hevc_qpel[9][0][1] = ff_hevc_put_hevc_qpel_h64_8_avx512icl; c->put_hevc_qpel[3][1][1] = ff_hevc_put_hevc_qpel_hv8_8_avx512icl; } } else if (bit_depth == 10) {