From patchwork Fri Feb 10 13:06:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Darnley X-Patchwork-Id: 40347 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:5494:b0:bf:7b3a:fd32 with SMTP id i20csp1355329pzk; Fri, 10 Feb 2023 05:09:49 -0800 (PST) X-Google-Smtp-Source: AK7set9ljIUecnM7Q/4eZOASZDrWhBPcrZsXnhx0YTP0QE6v39nElTLJpA3U11ZNjGtlV6EpjEEg X-Received: by 2002:a17:906:1514:b0:88c:4f0d:85af with SMTP id b20-20020a170906151400b0088c4f0d85afmr15697074ejd.75.1676034589560; Fri, 10 Feb 2023 05:09:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676034589; cv=none; d=google.com; s=arc-20160816; b=fHY2JOUIVwdWyGXW0txwR4pDYNpXaJUX4vEQ2xWy5xrkizbgvNDLpdCwPKXdtWsWdK kMvquAQPDj/DKnPCzmUxcAEtDL2zeiRiiY6XURuBz8mPRVGc/0QuYDmCfyXJnYLZUM4F 5Q1nwr0bPoQ0GTvBN++CIXMouCuUfISKkdGf8p90laKy/7sGXClti3WHM4gVEGfzrpq7 +UOrj7BF6lrA22p+yTbPwH/XuVz9KEskoS58nGVOjOSDPgcuhp1u2cMWCc9EIP6q3RI/ pXxHejAS7Q0eWPZJ6G5grcPe7n1td0Tw39qeWdAUiVYgKY5mlIHmuUy3y/SSICamXGtg e0Og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=pfdG1JoBf+dZ/Hxk0HlrGUKHg69LbpkcYPqVq1H5jao=; b=N9NmeRUBWBOiwCuk7zPXc/6PnPip6afaREf3BJY6c35ZWhw9+1pnBEwreXsDju8tt0 9cxSMhHxyhPDfcegG/JEjJmZ2uXmgz5/RcVvJlMDt2KCfF4mukw9MlaPUtXIXy/UwzRe 1dcVlnYycMQM/ndt97hgJ9EjOm2KdgmN29jsnqBd7KcYI4GaW1rebu/113LuLcVJgLPu +WN8xcLzCN2Kn62mmhFKoXL1BfGWXGftHEiOt7fWQTk/2sPkXoduM/G4aKMygeuAugrX XTQjrCutwOBEqs4pQbgnjCSlMPjBCxBm+mb6xgILFg/WpF2Bu/ebjdgEV2eR0pn+z7yj CnSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b=VMWAoXRb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 13-20020a170906028d00b0088bb8fa1182si8262983ejf.77.2023.02.10.05.09.49; Fri, 10 Feb 2023 05:09:49 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@obe-tv.20210112.gappssmtp.com header.s=20210112 header.b=VMWAoXRb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1349E68BE7E; Fri, 10 Feb 2023 15:09:26 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 988F568B840 for ; Fri, 10 Feb 2023 15:09:17 +0200 (EET) Received: by mail-ed1-f54.google.com with SMTP id da9so4643817edb.12 for ; Fri, 10 Feb 2023 05:09:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=obe-tv.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=+HY3u+V6re6KBUGa5o+mD8equ2cBo/ODTBSatEHBBhs=; b=VMWAoXRb+CsdzbLqaBDcrznvx/qoHq0ziEKDhzp//lSEZodW7wan2M7Rr78GSAL43Q anOKrF8gZJiL5S1ZL5eYTXvx6xyGcuUJwaAF484NXZfZluGl8jZ3YNaGWcp7YMSGbmjz QoylT9lpiZ2hUkfPxAzH9jdX28LWHOyLPm1Azr3K73daVnSNn0a8mFpWAGi3X56ScNDB Y29G3icIdqXQafyjzSvijtk/7pOlNE/OVhkrL2MMpalqBNkOlBhU01BrrzU6LrTWBEud cxI3weGvFCt7M7dP3tcOxIkqAj9Z5HQrvv9msaA1HjXKPeudePTUhXNVuVBgCX8Ia9Uq f1Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+HY3u+V6re6KBUGa5o+mD8equ2cBo/ODTBSatEHBBhs=; b=D6d3dl2SoFapAShgSNrJCULVmw8vD4hDHlpmHlwH087K3WBm5hTpfbrEA3f1VLKiHP +eqCH5sHD8PEZXIeOoNIEKYgkrxzV5BQtFDI4zu37Rn4wCnPsLK9M+1bnmpwvlOvyY/I 7bzKQKM6yD4BTXnUont5/SYLOc+Q+R+VfUrzycDGIb2QftpM43NAvfjZzFtgv2ZIcrJc IFe/NyCfoh5bVRoNpQswZ16kZOkUeotDBwHHCnFrttJH8HLvPz3R4cMlLXNc6k5eP15r 65m1ku0azK672IEoFQL1uVmvR4j1eeC7aUJFP/7wn3cmdsl3kp0AWDa3lFdhafRjLALx 34xA== X-Gm-Message-State: AO0yUKWPUtydApQNKhvBzv+GXXP744kxJPuJUp4ePsdesG+NcyQk+O0i de0mBSIoC5QnT5knNdufLx+0GkyRNfATlKQtUmRqsA== X-Received: by 2002:a50:ce4e:0:b0:4aa:c355:bc92 with SMTP id k14-20020a50ce4e000000b004aac355bc92mr14696360edj.32.1676034557122; Fri, 10 Feb 2023 05:09:17 -0800 (PST) Received: from Dana.systemlords.lan (d51A44418.access.telenet.be. [81.164.68.24]) by smtp.gmail.com with ESMTPSA id x13-20020a50d60d000000b004aab23dec5csm2204059edi.4.2023.02.10.05.09.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Feb 2023 05:09:16 -0800 (PST) From: James Darnley To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Feb 2023 14:06:57 +0100 Message-Id: <20230210130657.455866-3-jdarnley@obe.tv> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230210130657.455866-1-jdarnley@obe.tv> References: <20230210130657.455866-1-jdarnley@obe.tv> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] avfilter/yadif: add avx2 filter_line function X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rApjb4eU4B1x Zen 2 (Ryzen 7 3700X): 1.73x faster (3603±586.3 vs. 2082±317.1 decicycles) compared with ssse3 Using an SD y4m file speed increases from ~ 3600 fps to ~4700. --- libavfilter/x86/vf_yadif.asm | 83 +++++++++++++++++++++++---------- libavfilter/x86/vf_yadif_init.c | 4 ++ 2 files changed, 62 insertions(+), 25 deletions(-) diff --git a/libavfilter/x86/vf_yadif.asm b/libavfilter/x86/vf_yadif.asm index 809cebdd3f..571febfca3 100644 --- a/libavfilter/x86/vf_yadif.asm +++ b/libavfilter/x86/vf_yadif.asm @@ -25,11 +25,30 @@ SECTION_RODATA -pb_1: times 16 db 1 -pw_1: times 8 dw 1 +pb_1: times 32 db 1 +pw_1: times 16 dw 1 SECTION .text +%unmacro RSHIFT 2 + +%macro RSHIFT 2 +%if mmsize == 32 + vextracti128 xm7, %1, 1 + palignr xmm %+ %1, xm7, xmm %+ %1, 2 +%else + psrldq %1, %2 +%endif +%endmacro + +%macro UNPACK 1 +%if mmsize == 32 + pmovzxbw %1, xmm %+ %1 +%else + punpcklbw %1, m7 +%endif +%endmacro + %macro CHECK 2 movu m2, [curq+t1+%1] movu m3, [curq+t0+%2] @@ -40,7 +59,7 @@ SECTION .text pand m4, [pb_1] psubusb m5, m4 RSHIFT m5, 1 - punpcklbw m5, m7 + UNPACK m5 mova m4, m2 psubusb m2, m3 psubusb m3, m4 @@ -49,9 +68,9 @@ SECTION .text mova m4, m2 RSHIFT m3, 1 RSHIFT m4, 2 - punpcklbw m2, m7 - punpcklbw m3, m7 - punpcklbw m4, m7 + UNPACK m2 + UNPACK m3 + UNPACK m4 paddw m2, m3 paddw m2, m4 %endmacro @@ -81,13 +100,19 @@ SECTION .text %endmacro %macro LOAD 2 - movh %1, %2 - punpcklbw %1, m7 + %if mmsize == 32 + pmovzxbw %1, %2 + %else + movh %1, %2 + punpcklbw %1, m7 + %endif %endmacro %macro FILTER 3 .loop%1: - pxor m7, m7 + %if mmsize != 32 + pxor m7, m7 + %endif LOAD m0, [curq+t1] LOAD m1, [curq+t0] LOAD m2, [%2] @@ -95,9 +120,9 @@ SECTION .text mova m4, m3 paddw m3, m2 psraw m3, 1 - mova [rsp+ 0], m0 - mova [rsp+16], m3 - mova [rsp+32], m1 + mova [rsp+0*mmsize], m0 + mova [rsp+1*mmsize], m3 + mova [rsp+2*mmsize], m1 psubw m2, m4 ABS1 m2, m4 LOAD m3, [prevq+t1] @@ -119,7 +144,7 @@ SECTION .text paddw m3, m4 psrlw m3, 1 pmaxsw m2, m3 - mova [rsp+48], m2 + mova [rsp+3*mmsize], m2 paddw m1, m0 paddw m0, m0 @@ -134,9 +159,9 @@ SECTION .text psubusb m3, m4 pmaxub m2, m3 mova m3, m2 - psrldq m3, 2 - punpcklbw m2, m7 - punpcklbw m3, m7 + RSHIFT m3, 2 + UNPACK m2 + UNPACK m3 paddw m0, m2 paddw m0, m3 psubw m0, [pw_1] @@ -150,7 +175,7 @@ SECTION .text CHECK 1, -3 CHECK2 - mova m6, [rsp+48] + mova m6, [rsp+3*mmsize] cmp DWORD r8m, 2 jge .end%1 LOAD m2, [%2+t1*2] @@ -161,9 +186,9 @@ SECTION .text paddw m3, m5 psrlw m2, 1 psrlw m3, 1 - mova m4, [rsp+ 0] - mova m5, [rsp+16] - mova m7, [rsp+32] + mova m4, [rsp+0*mmsize] + mova m5, [rsp+1*mmsize] + mova m7, [rsp+2*mmsize] psubw m2, m4 psubw m3, m7 mova m0, m5 @@ -182,15 +207,21 @@ SECTION .text pmaxsw m6, m4 .end%1: - mova m2, [rsp+16] + mova m2, [rsp+1*mmsize] mova m3, m2 psubw m2, m6 paddw m3, m6 pmaxsw m1, m2 pminsw m1, m3 - packuswb m1, m1 - movh [dstq], m1 + %if mmsize == 32 + vextracti128 xm4, ym1, 1 + packuswb xm1, xm4 + movu [dstq], xm1 + %else + packuswb m1, m1 + movh [dstq], m1 + %endif add dstq, mmsize/2 add prevq, mmsize/2 add curq, mmsize/2 @@ -201,10 +232,10 @@ SECTION .text %macro YADIF 0 %if ARCH_X86_32 -cglobal yadif_filter_line, 4, 6, 8, 80, dst, prev, cur, next, w, prefs, \ +cglobal yadif_filter_line, 4, 6, 8, 4*mmsize, dst, prev, cur, next, w, prefs, \ mrefs, parity, mode %else -cglobal yadif_filter_line, 4, 7, 8, 80, dst, prev, cur, next, w, prefs, \ +cglobal yadif_filter_line, 4, 7, 8, 4*mmsize, dst, prev, cur, next, w, prefs, \ mrefs, parity, mode %endif %if ARCH_X86_32 @@ -233,3 +264,5 @@ INIT_XMM ssse3 YADIF INIT_XMM sse2 YADIF +INIT_YMM avx2 +YADIF diff --git a/libavfilter/x86/vf_yadif_init.c b/libavfilter/x86/vf_yadif_init.c index d648f0f835..48858dc295 100644 --- a/libavfilter/x86/vf_yadif_init.c +++ b/libavfilter/x86/vf_yadif_init.c @@ -29,6 +29,8 @@ void ff_yadif_filter_line_sse2(void *dst, void *prev, void *cur, void ff_yadif_filter_line_ssse3(void *dst, void *prev, void *cur, void *next, int w, int prefs, int mrefs, int parity, int mode); +void ff_yadif_filter_line_avx2(void *dst, void *prev, void *cur, void *next, + int w, int prefs, int mrefs, int parity, int mode); void ff_yadif_filter_line_16bit_sse2(void *dst, void *prev, void *cur, void *next, int w, int prefs, @@ -68,5 +70,7 @@ av_cold void ff_yadif_init_x86(YADIFContext *yadif, int bit_depth) yadif->filter_line = ff_yadif_filter_line_sse2; if (EXTERNAL_SSSE3(cpu_flags)) yadif->filter_line = ff_yadif_filter_line_ssse3; + if (EXTERNAL_AVX2(cpu_flags)) + yadif->filter_line = ff_yadif_filter_line_avx2; } }