From patchwork Thu May 23 12:27:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49175 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1004980vqf; Thu, 23 May 2024 05:28:01 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW32dMesbNYBtoieuAYShEelQiSMNNx4z3Jh1MPaXwgJRS7z+wedx/evOHF1uRPhLSrvKWzD8J72RF3zweFgO7oc6nvfWklA7UUHA== X-Google-Smtp-Source: AGHT+IEnK7GNYYvXJNT/3VRCRBK22j+o/IoKw1uEs2HB730SXm0pvfoOBrBUsUMhmYuI6UG0cYQz X-Received: by 2002:a17:906:7fcb:b0:a5a:89cf:489a with SMTP id a640c23a62f3a-a62281a3776mr353694366b.4.1716467280875; Thu, 23 May 2024 05:28:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467280; cv=none; d=google.com; s=arc-20160816; b=Nv2tDLygzXeTaKdY79iVIi6uRucqdoNKv5hFUgEUsxYekqolo+DuXrfSsnNpLeOo5g pUjLzPxIB72msrwz8C42hI3OKrudxObuUCfZN7bQ3zoAEjnvm2EZSKgtrzv0q752ycEz J4Ny5Q6V0Vfq6rqWyw9h+iJLx2SjwiWSkXWHLBBjyEziGopS8Nt81Ssq9V1fznaJVVGn EZSlr4YdY8/AhmTe0tHPJ4q7flfylNCBdCdOZ03JgtZWZLxK3FMphoss8Qd54gg1XQjH 62BHGaTbqYxpY4Qb0vI2tmYFN3/tvcJtFfRyAenzEfPr6IAT37wfXF+hIBKCtdcugQpb m9fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=bqm+7EnH6EP6ycFggyj1RVN7L5uTEETHadv6JciH1To=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=q+R5afQM8MwYEyqV+xUXIZXH4s4n+9TtdfNnSFeMbJu+FwW1K9oWncu+wqxqZFOK+E c9mCuMG49cfouOUzvMVbTN8dzJOli1v8mgD0k+gcl66YEk3VVyDtsTxQJxlh8QPKkDcM fTfwiw4bUNDAyZMh8YKslnlpfQWv6zEeooVXK8m+fvg96BGut9OwXyKwTLtRaoY2pHjh okPJaQtO7Ie7m9+lR1B9t4gt6Ynlm6b8Kxt5uIQg9CmzDkcN2TUSi0N0h6kjcT8h/zw0 zZEa10Gi1XQ7FnuiqjVE3enbjvM1HtAAtgu9pDcA/gmmuX+iCC4q9pK+PuFEaJp00ybD 0Ceg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=F6LZfh75; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5d0cc81535si735913166b.570.2024.05.23.05.28.00; Thu, 23 May 2024 05:28:00 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=F6LZfh75; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD26568D355; Thu, 23 May 2024 15:27:56 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id ECA8E68D355 for ; Thu, 23 May 2024 15:27:49 +0300 (EEST) Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6f4521ad6c0so2280974b3a.0 for ; Thu, 23 May 2024 05:27:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467267; x=1717072067; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yqaYh32wH6nemsMr08q+YjAfyH3YMfMKFGKya8j8xJ0=; b=F6LZfh75DfNetvW4sHUQQLeFHgrTtyo2QxA5KmChfhF+W4sMQN76Ii4oEGd0UPpPpb 8XuSOse4ALoWrHFAeT74DEdFlLWqu97kryzcJkWyKctVuof9DcOpoZjsRu1pdxikRwnc oyUUBtSwTGeLqCXDPhYFC2NEdNAOQSjzWa89Gq704g+kdFkxuaZeC3dgW+z9pwZ5zspi NHfn/CmohdfLd25AfaMk63BL2ahtRscnv2hI2ky4vLlUMSrxx6hrYgrxMYm1Wv6pWwMB 3yEphIbYUsU7p2AtmKkJwgzQInY4pfotNxzUkIFCwU1z8/Wn3vW5V0Gz7eysOI8fOqI0 Ivzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467267; x=1717072067; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yqaYh32wH6nemsMr08q+YjAfyH3YMfMKFGKya8j8xJ0=; b=mVc1wTBjksYOfKrecro7nTvukevj6tk8xB1uZhedzKH+E8Eg4yUKU0z/O07qCV+QxG nPuaCHLK6YhxofU8R1FpqiPZE2RaAJeZ+v81oZ3nsZX4e+T/C+qjXOMeAWH67nj6cD4c yYn569XQnNqSVqrjiY2/h11sd72MaPuPmZMFHCqBMNAtemCtWYSYioqtvo37aWQxBeqg Nje+cvUprfgmXkXL78q/Fhw72lmBjvH71ceeTBfnnt1ChC/ve8xjxnDfzDW9YLUv3Kjl kJU5d/g7e2A7ahyli4/Gl7KWM5QcHvnmooI7j5lClgz/GQM2heDz/hpFWz+gUv9EuvJ/ XMnQ== X-Gm-Message-State: AOJu0YxifNtYqbzkztSxOZ3aHbBoVDc7MOGR8uLlkYGX5blLomXY7rLb /eRMqC9eP53aCSIX/B21rnamqMDj6QvbOznFdlrCQdjk50hhBixMng608w== X-Received: by 2002:a05:6a20:7489:b0:1ad:7bfd:54a1 with SMTP id adf61e73a8af0-1b1f879cf62mr6184833637.17.1716467267032; Thu, 23 May 2024 05:27:47 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:46 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:12 -0300 Message-ID: <20240523122716.2158-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/5] avcodec/vvc_mc: split the SAD dsp prototype into one function per blocksize width X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 0AONoo7cntTG Signed-off-by: James Almer --- libavcodec/vvc/dsp.h | 2 +- libavcodec/vvc/inter.c | 6 ++++-- libavcodec/vvc/inter_template.c | 6 +++++- libavcodec/x86/vvc/vvc_sad.asm | 32 ++++++++++++++++++++++++++------ libavcodec/x86/vvc/vvcdsp_init.c | 22 +++++++++++++++++----- tests/checkasm/vvc_mc.c | 3 ++- 6 files changed, 55 insertions(+), 16 deletions(-) diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 1f14096c41..55c4c81f53 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -99,7 +99,7 @@ typedef struct VVCInterDSPContext { void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); - int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); + int (*sad[5])(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, intptr_t mx, intptr_t my, int width); } VVCInterDSPContext; diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c index e1011b4fa1..0214e46634 100644 --- a/libavcodec/vvc/inter.c +++ b/libavcodec/vvc/inter.c @@ -740,6 +740,8 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, const AVFrame *ref0, const AVFrame *ref1, const int x_off, const int y_off, const int block_w, const int block_h) { const VVCFrameContext *fc = lc->fc; + static const uint8_t sad_tab[16] = { 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4 }; + const int tab = sad_tab[(FFALIGN(block_w, 8) >> 3) - 1]; const int sr_range = 2; const AVFrame *ref[] = { ref0, ref1 }; int16_t *tmp[] = { lc->tmp, lc->tmp1 }; @@ -763,7 +765,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w); } - min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w, block_h); + min_sad = fc->vvcdsp.inter.sad[tab](tmp[L0], tmp[L1], dx, dy, block_w, block_h); min_sad -= min_sad >> 2; sad[dy][dx] = min_sad; @@ -773,7 +775,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, for (dy = 0; dy < SAD_ARRAY_SIZE; dy++) { for (dx = 0; dx < SAD_ARRAY_SIZE; dx++) { if (dx != sr_range || dy != sr_range) { - sad[dy][dx] = fc->vvcdsp.inter.sad(lc->tmp, lc->tmp1, dx, dy, block_w, block_h); + sad[dy][dx] = fc->vvcdsp.inter.sad[tab](lc->tmp, lc->tmp1, dx, dy, block_w, block_h); if (sad[dy][dx] < min_sad) { min_sad = sad[dy][dx]; min_dx = dx; diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index a8068f4ba8..34485321d3 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -626,7 +626,11 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter) inter->apply_prof_uni_w = FUNC(apply_prof_uni_w); inter->apply_bdof = FUNC(apply_bdof); inter->prof_grad_filter = FUNC(prof_grad_filter); - inter->sad = vvc_sad; + inter->sad[0] = + inter->sad[1] = + inter->sad[2] = + inter->sad[3] = + inter->sad[4] = vvc_sad; } #undef FUNCS diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index b468d89ac2..a20818530f 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -51,7 +51,7 @@ SECTION .text INIT_YMM avx2 -cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -76,10 +76,6 @@ cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_ pxor m3, m3 vpbroadcastd m4, [pw_1] - cmp block_wd, 16 - jge vvc_sad_16_128 - - vvc_sad_8: .loop_height: movu xm0, [src1q] vinserti128 m0, m0, [src1q + MAX_PB_SIZE * ROWS * 2], 1 @@ -100,7 +96,31 @@ cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_ movd eax, xm0 RET - vvc_sad_16_128: +cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx + movsxdifnidn dxq, dxd + movsxdifnidn dyq, dyd + + sub dxq, 2 + sub dyq, 2 + + mov off1q, 2 + mov off2q, 2 + + add off1q, dyq + sub off2q, dyq + + shl off1q, 7 + shl off2q, 7 + + add off1q, dxq + sub off2q, dxq + + lea src1q, [src1q + off1q * 2 + 2 * 2] + lea src2q, [src2q + off2q * 2 + 2 * 2] + + pxor m3, m3 + vpbroadcastd m4, [pw_1] + sar block_wd, 4 .loop_height: mov off1q, src1q diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 4b4a2aa937..bd60963432 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -312,8 +312,20 @@ ALF_FUNCS(16, 12, avx2) c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) -int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); -#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 +#define SAD_PROTOTYPE(w, opt) \ +int bf(ff_vvc_sad, w, opt)(const int16_t *src0, const int16_t *src1, \ + int dx, int dy, int block_w, int block_h) \ + +SAD_PROTOTYPE(8, avx2); +SAD_PROTOTYPE(16, avx2); + +#define SAD_INIT(opt) do { \ + c->inter.sad[0] = ff_vvc_sad_8_##opt; \ + c->inter.sad[1] = \ + c->inter.sad[2] = \ + c->inter.sad[3] = \ + c->inter.sad[4] = ff_vvc_sad_16_##opt; \ +} while (0) #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -330,7 +342,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); - SAD_INIT(); + SAD_INIT(avx2); } break; case 10: @@ -342,7 +354,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); - SAD_INIT(); + SAD_INIT(avx2); } break; case 12: @@ -354,7 +366,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); - SAD_INIT(); + SAD_INIT(avx2); } break; default: diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index 1e889e2cff..deae1014d2 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -327,6 +327,7 @@ static void check_avg(void) static void check_vvc_sad(void) { const int bit_depth = 10; + static const uint8_t sad_tab[16] = { 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4 }; VVCDSPContext c; LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]); LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]); @@ -341,7 +342,7 @@ static void check_vvc_sad(void) for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) { for(int offy = 0; offy <= 4; offy++) { for(int offx = 0; offx <= 4; offx++) { - if(check_func(c.inter.sad, "sad_%dx%d", w, h)) { + if(check_func(c.inter.sad[sad_tab[(w >> 3) - 1]], "sad_%dx%d", w, h)) { int result0; int result1; From patchwork Thu May 23 12:27:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49176 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1005075vqf; Thu, 23 May 2024 05:28:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW1r16RIrI6Y7Nz55U4o72UijcYe2lUYZvEfTmQB61cLiX0hp7XMp3D5MLPpBnDrNgP36vd0n1f1aHRp1f7qT/C8JcElEF4q8HqeA== X-Google-Smtp-Source: AGHT+IFjAIgA/DDtSHP5Kfk4TWE9awxELuZpvD7FTCOQTktdCkQClYHyzEZvtP5Z2LJLAzPnTyod X-Received: by 2002:ac2:5331:0:b0:519:6e94:9b4d with SMTP id 2adb3069b0e04-526bef87b28mr3048527e87.48.1716467292231; Thu, 23 May 2024 05:28:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467292; cv=none; d=google.com; s=arc-20160816; b=PyZo685UI/IuBYAr3iWVke1CD3/o6nxKiKsl6MX36VWQuSiCSsxWca8ieLS120zz+Y 4JRWobPLI+NrKp1k1r/6zRMEtwKZ1feZceF7R+v5/yQS9DWinC8OH0R8HyV6QXsZKk9o UYspA1MNNpzR8Fv6QXFTazFykJOCTDAqZm8tW6+D+XrugbLzEbUonXbQTozTnLje9s2h /dh5AMYXto1NO+ydJ7yi8Hn04ViQD0CsbdbWs/fZNsyc4q9E9v+V4wEVA+Ct4UQGVMjx a80MSGyED2GaSOyT37O7Z6pnl9hR9RrPlRMYFuJyKUulX0aL2u3Siqlev0ibO4okpnsf EqRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=axrTnrABl1HBskWxpjd9WxXF7BB1lUt0QiFT9DxnMvo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=gNRf8+faU207RNfJ1pnqgsorQlzJNl+1fJtWQrfNfFz8n3UmEWbXKWk/S/gM2UMl/f UXF6qmUfbPAkvgIaO7E3vPwvAjMEQTH1ZyTUGm7lrm+bEBwfWNVm1R/mOCDlDl1giEu7 WUvvEsehVPQ5sKqsc2EF+iVCEhBq16zN7YHo/WZMg6gSQVy0su0T+DTAyPzN54hdqWb4 SuqCZKf1ZT/FXK8hu6Q+hRz3LUCTmFawFaP2DJH1+v/sYcl/KwNmU5k4yp13L6M/RxIJ aNljzm0jlfqYc4mvWBsJtr8vPmXEDyf+UAx5o/KIcE8tumR36CZcosBlHVPRqsexRsPi B+0Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=IdFoYH0m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f38d3f5dsi10213878e87.265.2024.05.23.05.28.11; Thu, 23 May 2024 05:28:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=IdFoYH0m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D0D5F68D48C; Thu, 23 May 2024 15:27:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7772768D355 for ; Thu, 23 May 2024 15:27:51 +0300 (EEST) Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1ee954e0aa6so19955245ad.3 for ; Thu, 23 May 2024 05:27:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467269; x=1717072069; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=IdFoYH0mWS/0dsGSw5lG6vZTZbr6BfdRj/D/7Y0HiwrWCV+cWdUdk+c8XNN8qNUWdG 26CyV2SbszknWxc62gG6cp1OBOkI/w08AjTzNyCuMp8uaRFBV7QH85Fpo5F69kFX4SaR tUzHS9ntDxG2ONNKNgI0zOoi5jWWgaJjVsAYy+8TXryGJKDKnvPOcVPSZONlSDGC5jBP 45dmKIR64v5c2EjR01OKY9UDIRk/YRCuwTeSdnEyBTtFDwVkXkDU5uvvGbh/VLXlUStE OtvQXiV0hsegUlBNkzSDyCz5WuvnfKC/y60fpuG3TXiPKRw11YYlXMvKQDOcOXCBHlWs bw3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467269; x=1717072069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=g1xVzJ+xmsry/7v7kNUMHHOdRTnaU0G51LoVIxA2biTR2Vp9gyBhOHESLSrDBxk/c1 a61d9ARRlu76qitkeAbIrzo+g68idC699ZCHLi4VfO62rDCz1El6/7loELGJGOmgK9d3 3lYTR8gYb8gI9p16uZtYlRTDRg1x5nKQ/iucw/31ylAjOsHaR2IIqPZMcaOgqihRtCuC Q9b6m/BY4CmTSQ38whOHEsI44zI+rAIR0QwPpkrVy8xMPLTj95nxd+ZbyDfQThG3IBob 2E33GBPuIoZJ4gu395aUVRPc2hBagU/DZ9SrHC3L3+79yvUo/GUJjWJAIZkF5P+Q8P1C vnzw== X-Gm-Message-State: AOJu0Yy1UPcH9zHBjU7bNZrURasdy/rsD+hFOSU+icI6QeeF1ckiHsMZ 4LfEV67e2f5BeE+EZchAY2OTfz3DTQEqzUGEk1Dd71IRaTGb847JD3OWbA== X-Received: by 2002:a17:902:b908:b0:1f2:f497:2409 with SMTP id d9443c01a7336-1f31c978a36mr42404885ad.19.1716467268759; Thu, 23 May 2024 05:27:48 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:48 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:13 -0300 Message-ID: <20240523122716.2158-2-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] x86/vvc_sad: optimize vvc_sad_16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: bvCsYZijgw3D Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index a20818530f..829dbce489 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -96,7 +96,7 @@ cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, ro movd eax, xm0 RET -cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +cglobal vvc_sad_16, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -121,26 +121,27 @@ cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, r pxor m3, m3 vpbroadcastd m4, [pw_1] - sar block_wd, 4 + shl block_wd, 1 + add src1q, block_wq + add src2q, block_wq + neg block_wq + +DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx .loop_height: - mov off1q, src1q - mov off2q, src2q - mov row_idxd, block_wd + mov row_idxq, block_wq .loop_width: - movu m0, [src1q] - movu m1, [src2q] + movu m0, [src1q+row_idxq] + movu m1, [src2q+row_idxq] MIN_MAX_SAD m1, m0, m2 pmaddwd m1, m4 paddd m3, m1 - add src1q, 32 - add src2q, 32 - dec row_idxd - jg .loop_width + add row_idxq, mmsize + jl .loop_width - lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] - lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + add src1q, ROWS * MAX_PB_SIZE * 2 + add src2q, ROWS * MAX_PB_SIZE * 2 sub block_hd, 2 jg .loop_height From patchwork Thu May 23 12:27:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49177 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1005153vqf; Thu, 23 May 2024 05:28:20 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUgk42W0yI1jfa9LFNsKmhBvJ9mKLjjPkgnOCl9iIK+u46o08MrhhQ0VEtdnQPxEyHebDg+Hi2UZBecDxQXP8iLUoiOkaJ+K1geyg== X-Google-Smtp-Source: AGHT+IFDriXpnoWyZvlpr+fsBALCEtZ2zHXEFLfY9w3+JIL3M1bCvrHGr/W4+ktkU4Ed5gwrXz4u X-Received: by 2002:a50:c308:0:b0:575:a7e:4f82 with SMTP id 4fb4d7f45d1cf-57843c4349fmr1981892a12.4.1716467300486; Thu, 23 May 2024 05:28:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467300; cv=none; d=google.com; s=arc-20160816; b=DoGkbZ8ojv8NjlG65DCGsTNl92rSjSZDW8LfrxlY00cq5vJQCNxINYlfLSoOeet0os bqWT5TrRC5uk4gjI4fe3A4Lz+okf3NQ7TwC7FVOVuqB1HCu3yMDM0GjlyEI/x8dqyy1V 1DEwgzJZBT1XxbBI1XdH/yifnEsgE7Qx7zHML/N1EjKQ5sFG6Hu3qhl678Ln21rKjnli 4igYNLNSbfWJlGIKJGN6xt/6bOwP0W5d15F0MhgYxzxfbLJZb4RJdBh9zH9ljR0h8nqG 0fSMCrDRhf3GiThTYpDsVKD1jnforz5wToaD80LJ5DsE+hTUjgophHMrVrDmaJYrAfeP gnlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=kcf7H6fi3ezoCWor2PVwIz0uhpiVI1AkdxzQxLl3ziA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=JfbSL73FXedt+/PTlGk2Gb9IMOi894hNoiXMMp6Ur5XeN6aTKzTJGuMP1qchFvEuF8 pcFpCw2tVPyG9rfT7/HyU/Ypeq7QrNLDfZhQXammgUnTNa9ZumTCt4y89XG2xXLb7mtZ 6uV8Y4Ziac26bMDTpB+kSurbKZNdbLWP6W04K1It8vXYYzkUgPlbCGttYCigiZmr3Zbi XehLhAILLMQuLPTjr+Vd9STWUPkZbwq0iQO2yBlXfZirntTsMlaaY11AP9Wov2CfTzkb GILlY71QAMdRjXN5tyDko2b9CIEfTW51wbWN/yvEOsDij/z7IdlB1kgH+FTs95Mv/NXx cGag==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=cNtX4pxu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5733bec496dsi17290296a12.197.2024.05.23.05.28.20; Thu, 23 May 2024 05:28:20 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=cNtX4pxu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 665BD68D3CA; Thu, 23 May 2024 15:28:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CCD3468D47C for ; Thu, 23 May 2024 15:27:52 +0300 (EEST) Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1f333e7a669so10980575ad.3 for ; Thu, 23 May 2024 05:27:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467270; x=1717072070; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=ekMMY5zLfAjfdBlCPOpvb6u3f2PYonkYg6dLAUmAveQ=; b=cNtX4pxuC0b0t8xNV1dCnNe9scGRxNnpJQhvZc6rgvEuk15+54mmyJM9TlUE16sMRn nVNj0F+xh1Vwf1eizcqrpTPFaoQUcavfwSP17socc22FHDPOIpQTwXnsgcKXhRZwp3a4 eYex0jBKlqp4B6eOapcXYS1WEUrqnrus07YScfeYiOH986aJXsfM+jlGzF3OolcE6Xkx tBYd0ETcSa3ac5F5FASzdiylBAA3dWpUPRUul1fwYAtDVwwQI6vBGMy9ketvXC5A7foY f9QxyOehLm/TAUHJmEyEAIE4Vwk7N6rv2S8tWFBCMsR3bVvpRUtVCpZPOnMAnAmG4YO8 hOHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467270; x=1717072070; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ekMMY5zLfAjfdBlCPOpvb6u3f2PYonkYg6dLAUmAveQ=; b=Nj7RciNweQZd0MYShJWJi/pmptgLnvCO+//lqAhNE2Y1tRXcWrbbivPR5/eVyVQoFd cla36xJMIWO8X5TtViROqJIqHhubm6sSr36DOjFZqlWYkdWHVa0GgvXLXMnM7gpbxAf+ Qu3pM6VIAgC7GIyhh0kX2+lRShsAs0P05kZFHsHk51IIR7cbvVIbUBnrmoYg4770lQ3Z Yq4odynePnv+u5KbnY02AMBHCQsGYalvRK7YQJSePZfS8nVeD7X0DQ6SxX7LPSeCWvvk Ast0t9kOP18VcM3iWKuCLILopeZ07DWl31gtyN52Tnn9V7ZkzZkmU8e6BbhfcUgYnhu9 BXBg== X-Gm-Message-State: AOJu0Yw3EbWgE+Gbyil07fzpZ1M2vM0pIiK3fgw3eAohf5QAKdqsZdB2 gQHIG5BEE5R+uoGRhIuB6fo4hjTljm0TESl9cUy1UL/AGG7pCo3fRm6YMA== X-Received: by 2002:a17:902:ce08:b0:1f2:fd5d:ced0 with SMTP id d9443c01a7336-1f31c94bdefmr58985265ad.12.1716467270428; Thu, 23 May 2024 05:27:50 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:50 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:14 -0300 Message-ID: <20240523122716.2158-3-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/5] x86/vvc_sad: add sse4 versions of all functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6q052lauZXQz And remove sad_8x8_avx2, as it's not faster than sad_8x8_sse4. sad_8x8_c: 54.8 sad_8x8_sse4: 14.3 sad_16x16_c: 200.8 sad_16x16_sse4: 34.8 sad_16x16_avx2: 29.8 sad_32x32_c: 826.3 sad_32x32_sse4: 113.8 sad_32x32_avx2: 69.3 sad_64x64_c: 3679.8 sad_64x64_sse4: 392.8 sad_64x64_avx2: 257.3 sad_128x128_c: 12581.3 sad_128x128_sse4: 1560.8 sad_128x128_avx2: 1151.8 Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 53 +++++++++++++++++++++----------- libavcodec/x86/vvc/vvcdsp_init.c | 42 +++++++++++++++++-------- 2 files changed, 65 insertions(+), 30 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index 829dbce489..26df25ec66 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -26,7 +26,7 @@ SECTION_RODATA -pw_1: times 2 dw 1 +cextern pw_1 ; DMVR SAD is only calculated on even rows to reduce complexity SECTION .text @@ -38,20 +38,21 @@ SECTION .text %endmacro %macro HORIZ_ADD 3 ; xm0, xm1, m1 +%if mmsize == 32 vextracti128 %1, %3, q0001 ; 3 2 1 0 - paddd %1, %2 ; xm0 (7 + 3) (6 + 2) (5 + 1) (4 + 0) - pshufd %2, %1, q0032 ; xm1 - - (7 + 3) (6 + 2) + paddd %2, %1 ; xm1 (7 + 3) (6 + 2) (5 + 1) (4 + 0) +%endif + pshufd %1, %2, q0032 ; xm0 - - (7 + 3) (6 + 2) paddd %1, %1, %2 ; xm0 _ _ (5 1 7 3) (4 0 6 2) pshufd %2, %1, q0001 ; xm1 _ _ (5 1 7 3) (5 1 7 3) paddd %1, %1, %2 ; (01234567) %endmacro -%if ARCH_X86_64 -%if HAVE_AVX2_EXTERNAL - -INIT_YMM avx2 - -cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +%macro VVC_SAD 1 +cglobal vvc_sad_%1, 4, 7, 5, src1, src2, dx, dy, off1, block_h, off2 +%if UNIX64 == 0 + mov block_hd, dword r5m +%endif movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -74,29 +75,32 @@ cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, ro lea src2q, [src2q + off2q * 2 + 2 * 2] pxor m3, m3 +%if mmsize == 32 vpbroadcastd m4, [pw_1] +%else + mova m4, [pw_1] +%endif .loop_height: - movu xm0, [src1q] - vinserti128 m0, m0, [src1q + MAX_PB_SIZE * ROWS * 2], 1 - movu xm1, [src2q] - vinserti128 m1, m1, [src2q + MAX_PB_SIZE * ROWS * 2], 1 - + movu m0, [src1q] + movu m1, [src2q] MIN_MAX_SAD m1, m0, m2 pmaddwd m1, m4 paddd m3, m1 - add src1q, 2 * MAX_PB_SIZE * ROWS * 2 - add src2q, 2 * MAX_PB_SIZE * ROWS * 2 + add src1q, ROWS * MAX_PB_SIZE * 2 + add src2q, ROWS * MAX_PB_SIZE * 2 - sub block_hd, 4 + sub block_hd, 2 jg .loop_height HORIZ_ADD xm0, xm3, m3 movd eax, xm0 RET +%endmacro -cglobal vvc_sad_16, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 +%macro VVC_SAD_LOOP 1 +cglobal vvc_sad_%1, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -119,7 +123,11 @@ cglobal vvc_sad_16, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 lea src2q, [src2q + off2q * 2 + 2 * 2] pxor m3, m3 +%if mmsize == 32 vpbroadcastd m4, [pw_1] +%else + mova m4, [pw_1] +%endif shl block_wd, 1 add src1q, block_wq @@ -149,6 +157,15 @@ DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx HORIZ_ADD xm0, xm3, m3 movd eax, xm0 RET +%endmacro +%if ARCH_X86_64 +INIT_XMM sse4 +VVC_SAD 8 +VVC_SAD_LOOP 16 +%if HAVE_AVX2_EXTERNAL +INIT_YMM avx2 +VVC_SAD 16 +VVC_SAD_LOOP 32 %endif %endif diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index bd60963432..cdf0e36b62 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -316,16 +316,10 @@ ALF_FUNCS(16, 12, avx2) int bf(ff_vvc_sad, w, opt)(const int16_t *src0, const int16_t *src1, \ int dx, int dy, int block_w, int block_h) \ -SAD_PROTOTYPE(8, avx2); +SAD_PROTOTYPE(8, sse4); +SAD_PROTOTYPE(16, sse4); SAD_PROTOTYPE(16, avx2); - -#define SAD_INIT(opt) do { \ - c->inter.sad[0] = ff_vvc_sad_8_##opt; \ - c->inter.sad[1] = \ - c->inter.sad[2] = \ - c->inter.sad[3] = \ - c->inter.sad[4] = ff_vvc_sad_16_##opt; \ -} while (0) +SAD_PROTOTYPE(32, avx2); #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -337,36 +331,60 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) case 8: if (EXTERNAL_SSE4(cpu_flags)) { MC_LINK_SSE4(8); + c->inter.sad[0] = ff_vvc_sad_8_sse4; + c->inter.sad[1] = + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); - SAD_INIT(avx2); + c->inter.sad[1] = ff_vvc_sad_16_avx2; + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_32_avx2; } break; case 10: if (EXTERNAL_SSE4(cpu_flags)) { MC_LINK_SSE4(10); + c->inter.sad[0] = ff_vvc_sad_8_sse4; + c->inter.sad[1] = + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { ALF_INIT(10); AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); - SAD_INIT(avx2); + c->inter.sad[1] = ff_vvc_sad_16_avx2; + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_32_avx2; } break; case 12: if (EXTERNAL_SSE4(cpu_flags)) { MC_LINK_SSE4(12); + c->inter.sad[0] = ff_vvc_sad_8_sse4; + c->inter.sad[1] = + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { ALF_INIT(12); AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); - SAD_INIT(avx2); + c->inter.sad[1] = ff_vvc_sad_16_avx2; + c->inter.sad[2] = + c->inter.sad[3] = + c->inter.sad[4] = ff_vvc_sad_32_avx2; } break; default: From patchwork Thu May 23 12:27:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49178 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1005228vqf; Thu, 23 May 2024 05:28:28 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUo2/c3MH9brSLRoZg0qf3MfwfLco8B7wCXgbHI70z1AMQeGqQnxh8mDzaoYPBTsIO8OrOxMoKVQs89HLdwfmpE/mmL1HSEx76XgA== X-Google-Smtp-Source: AGHT+IEKZKO80whx+9BOHYYBHuqtjWKWW7/AKO2sboKwh/gQHF8MvqpH3fAXbxJuAh3Mz8Df3tTg X-Received: by 2002:a05:6512:404:b0:523:94a3:26b5 with SMTP id 2adb3069b0e04-526c0d4951cmr2497074e87.5.1716467308585; Thu, 23 May 2024 05:28:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467308; cv=none; d=google.com; s=arc-20160816; b=C6uxMb8P8blhhhZGd8pOsMDhzn8fsPU8S2PE0Ebzdd29f23vQzDhB3bFWlASPXzYtR CiZMZiTiranqiJ5Z9+8lW6VdIe/F8sJXpRUPJGfxSlSccmBrLnF4IE7W1hn92jFLvvTo irqt6Ir+mbHiXsrzHyIC7vYMrPV8EaUZ2DCHMiYMOQJdjWwJXV6rrl94KIgRvKmqgU6i trrmMYIG5kDCZ+fX3sZtBUKBqNlU1fdSavanfFqw3Ph/2eaaCfQk7ACsSq273FtlvlJB 4CoCc6nhus7yWaKgRL08aPt/ryibOk347YUHbFjxOGATUeoYh+Q7G4Ra1cJcSLxPOMAy FtlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=nEs5k41+ejBdzMKu05V7MEk6Ui3TAqfhEHWAf948wpo=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=DVLdtyxcnnMP5J86EHGwCiHqD58V5ZzQxDdyvfnt8JP1mDmChUjxc1uim44q4wBg/l ybhhndwJ9YUyYbRDgE8uJlc6sv/GsH7oKy55u7sWuS5NvhFcoIxqq8VuyH4BQBh5yiwu hnOoUarGUfNYcRCfupox8oMX6iMmKEacyDXfvfY7kzKEH412kR3mxK8T6/eAIxLogg+e 95M+V126xX8C5awqJQU01c9hhO8XTncPMVpvalySKqQumvCAtFdo30j9bxHyTZPb+wkM uOveD9h5dV6Ssn9uw7UvH3q3pgE+VntLoXCptZFtmI/1ZBS2Knnue3h94Mlp8KwiXzGK C0pw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="e/2n6wBL"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17be8bd5si1656261466b.854.2024.05.23.05.28.28; Thu, 23 May 2024 05:28:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="e/2n6wBL"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BFBD268D4BE; Thu, 23 May 2024 15:28:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D9A6068D359 for ; Thu, 23 May 2024 15:27:54 +0300 (EEST) Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1f082d92864so122033005ad.1 for ; Thu, 23 May 2024 05:27:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467272; x=1717072072; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=R3r6NE177Cm/Y6zumu+K7fY8Spz32BAjPndW0/1HGH0=; b=e/2n6wBLYIzwfuBB+IzF8gSlYqt69ewshKpc6DCHehdf9BAAYJ6C/PtJlao5pLJRRs ejgHYSrG/STZ5rwHCX9+o/lBPgf6XSo2gevSs3CJOtXemqMm2xas7+SRS8WaxxFtKing cFf3dreBESytS+THnKp/j2jpLsSZ7hL8gM+WGLNP9eCb0fjCHD91CoqXRac+bLGpO5tm YqWDND0vrHrWqC7zt/jvWc0Sr5UJQtMQSgUOJ8JNlz69F07heAY17rGYJSkyQU+yRSVU A5bIcjXXmWTTgCGANPE3Wylxy26PtqeFxZtEn5ZvoKEmYpy7l9elD/nAD7tmbBBTae/v udRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467272; x=1717072072; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R3r6NE177Cm/Y6zumu+K7fY8Spz32BAjPndW0/1HGH0=; b=ZksbQ3P0+WBaIOKkckfBnnN5dz+/VrTmSbfzivkpYwNeish/HQkhfJmUewu9U9PT3L 4rwvAiUh6d4D99oHsbb2eR1hjst9MmRwEdCMqLF7slNv1ceSAyBaV/wrjcZ0kEoX1jMI 2lq/vJtex8AIrD9zfEQzzWDwbLEAU4ie2KAstXzXFMMp+h8NuigHYuIGWeUaUi7+GchT tFtDCfn2WckvLHXORyPW6UcSQLewIR0Df6WQZkbVb5pvYas4d65B9vKzPmHyR4CaKLia Cnvt2Gt1Arz30CbXwfyPYDb/rxgWE+Nl+FgTc/9Ydx9vZQ5skFr162OYDsPaJzTk+YT5 z7OA== X-Gm-Message-State: AOJu0Yw3GCu+LQximkb8HjlvYdO5IRhzI71PSNGrRFEOjdzpuHasuFYk 2y4MSRI62zDOoKd/mZedGPUi/Otwp+ry6cI2vCsr0qNQ7kFQfFfg9HXsug== X-Received: by 2002:a17:902:c405:b0:1f3:a6f:9de2 with SMTP id d9443c01a7336-1f31c9d027dmr57381065ad.50.1716467272066; Thu, 23 May 2024 05:27:52 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:51 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:15 -0300 Message-ID: <20240523122716.2158-4-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/5] x86/vvc_sad: reduce gpr usage in all loop functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: RtVY3gdb1a/C This way they can be assembled on x86_32 targets. Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 22 ++++++++++------------ libavcodec/x86/vvc/vvcdsp_init.c | 16 +++++++++++++--- 2 files changed, 23 insertions(+), 15 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index 26df25ec66..9881b1180d 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -49,7 +49,7 @@ SECTION .text %endmacro %macro VVC_SAD 1 -cglobal vvc_sad_%1, 4, 7, 5, src1, src2, dx, dy, off1, block_h, off2 +cglobal vvc_sad_%1, 4, 6, 5, src1, src2, dx, dy, off, block_h %if UNIX64 == 0 mov block_hd, dword r5m %endif @@ -59,12 +59,12 @@ cglobal vvc_sad_%1, 4, 7, 5, src1, src2, dx, dy, off1, block_h, off2 sub dxq, 2 sub dyq, 2 - mov off1q, 2 - mov off2q, 2 + mov offq, 2 - add off1q, dyq - sub off2q, dyq + sub offq, dyq + add dyq, 2 +DEFINE_ARGS src1, src2, dx, off1, off2, block_h shl off1q, 7 shl off2q, 7 @@ -100,19 +100,19 @@ cglobal vvc_sad_%1, 4, 7, 5, src1, src2, dx, dy, off1, block_h, off2 %endmacro %macro VVC_SAD_LOOP 1 -cglobal vvc_sad_%1, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 +cglobal vvc_sad_%1, 6, 7, 5, src1, src2, dx, dy, block_w, block_h, off movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd sub dxq, 2 sub dyq, 2 - mov off1q, 2 - mov off2q, 2 + mov offq, 2 - add off1q, dyq - sub off2q, dyq + sub offq, dyq + add dyq, 2 +DEFINE_ARGS src1, src2, dx, off1, block_w, block_h, off2 shl off1q, 7 shl off2q, 7 @@ -159,7 +159,6 @@ DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx RET %endmacro -%if ARCH_X86_64 INIT_XMM sse4 VVC_SAD 8 VVC_SAD_LOOP 16 @@ -168,4 +167,3 @@ INIT_YMM avx2 VVC_SAD 16 VVC_SAD_LOOP 32 %endif -%endif diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index cdf0e36b62..c0bd145191 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -311,6 +311,7 @@ ALF_FUNCS(16, 12, avx2) c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) +#endif #define SAD_PROTOTYPE(w, opt) \ int bf(ff_vvc_sad, w, opt)(const int16_t *src0, const int16_t *src1, \ @@ -320,17 +321,17 @@ SAD_PROTOTYPE(8, sse4); SAD_PROTOTYPE(16, sse4); SAD_PROTOTYPE(16, avx2); SAD_PROTOTYPE(32, avx2); -#endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) { -#if ARCH_X86_64 const int cpu_flags = av_get_cpu_flags(); switch (bd) { case 8: if (EXTERNAL_SSE4(cpu_flags)) { +#if ARCH_X86_64 MC_LINK_SSE4(8); +#endif c->inter.sad[0] = ff_vvc_sad_8_sse4; c->inter.sad[1] = c->inter.sad[2] = @@ -338,9 +339,11 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { +#if ARCH_X86_64 ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); +#endif c->inter.sad[1] = ff_vvc_sad_16_avx2; c->inter.sad[2] = c->inter.sad[3] = @@ -349,7 +352,9 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) break; case 10: if (EXTERNAL_SSE4(cpu_flags)) { +#if ARCH_X86_64 MC_LINK_SSE4(10); +#endif c->inter.sad[0] = ff_vvc_sad_8_sse4; c->inter.sad[1] = c->inter.sad[2] = @@ -357,10 +362,12 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { +#if ARCH_X86_64 ALF_INIT(10); AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); +#endif c->inter.sad[1] = ff_vvc_sad_16_avx2; c->inter.sad[2] = c->inter.sad[3] = @@ -369,7 +376,9 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) break; case 12: if (EXTERNAL_SSE4(cpu_flags)) { +#if ARCH_X86_64 MC_LINK_SSE4(12); +#endif c->inter.sad[0] = ff_vvc_sad_8_sse4; c->inter.sad[1] = c->inter.sad[2] = @@ -377,10 +386,12 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) c->inter.sad[4] = ff_vvc_sad_16_sse4; } if (EXTERNAL_AVX2_FAST(cpu_flags)) { +#if ARCH_X86_64 ALF_INIT(12); AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); +#endif c->inter.sad[1] = ff_vvc_sad_16_avx2; c->inter.sad[2] = c->inter.sad[3] = @@ -390,5 +401,4 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) default: break; } -#endif } From patchwork Thu May 23 12:27:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49179 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1005306vqf; Thu, 23 May 2024 05:28:38 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXCCetLRigkHqpqofNRshRMiW21r7poMumCxbNkIaKMUd+cLXXSytvskiOQ7wI5JtahaI2cqi2ojCb+N2SA4a1XaPI8bDYyHok00A== X-Google-Smtp-Source: AGHT+IEnS2sksp/Dmx9hdrxcUu6AUJHEvFBnjrYYLS3+dtMQc9jzxvE36gFPrTaXiLtEq7UDQv1b X-Received: by 2002:a17:906:2484:b0:a59:c6fd:5160 with SMTP id a640c23a62f3a-a622814ec91mr329675066b.76.1716467318654; Thu, 23 May 2024 05:28:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467318; cv=none; d=google.com; s=arc-20160816; b=km4eKhRGvowEuOvFbC9oXNQVp5c509tvvyZbX+H/NQUZuIhNU7xNd1lvhb58+CbO9j +Gj1wVPMAk1MSnKs7ocwgJB8SH0nwSk+oTu4SQ91Z9B39oP17FfqnreCO+MkHlXugdoS dcRkIv+/iS2Zf/XRK19y/hnPKQF9BDVa0x+nLwrinIrSIAX0KE1dI6BQVsLEoiUZ13BS CeHGob8G3fhlxaLrPUlBSt0gf7UiyCsVTA9bzGdLsmNVW4ZMxC0pRSKhsqvPBROrj2D9 dyjUdWTRb4EAMChGUy3JivMjIJ+l+QVbYIAtRQfmdEeWoxaNMUK1jGNctp4+jDC4IW3Y d8ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=zn9Pzjmu2sXuKrSN4CeeyhPkCJ3G7DpLGVQaJN6Knwk=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=JjNhSCj13C56C9T8oRb8RT/jjexlS3ABZeKZCNNSgOAP5rvlwOUJYgnY287agzczvP EOJ7FkuixfirMmqv+iQt8GNItEG9Z9A7NZkxMWamhEWwPEkWVnLzU8vRX+71YcQ17DjP 4Izl0Tx03sVWfL0wXbXOhhBjNt9auc4ySJmO2vaD9eKiEqG/Wt1AvFx1D4DGiM1l7yku xFF+Df96zY88iLxEKVxrDjnUjYnW1IsyqSQEPFIVMNf0Qmg8Hf0FMZ+qgPgFDL+FxUL/ ArIEKX/uMseI5ZKUzNKn4hjSpWwAevtdjFNUbrZ2xkW95jVqsUn7Rev1CncUOLz9/sp9 Le2g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=WKMusYgy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a6fc85b88si1254747766b.403.2024.05.23.05.28.37; Thu, 23 May 2024 05:28:38 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=WKMusYgy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 18AE568D41C; Thu, 23 May 2024 15:28:05 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C752B68D4A5 for ; Thu, 23 May 2024 15:27:55 +0300 (EEST) Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1ee7963db64so54063755ad.1 for ; Thu, 23 May 2024 05:27:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467274; x=1717072074; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=GadSyPT1OgvfhZcStRuzlfjyEXtAsrXTzSXHTS9Z1VY=; b=WKMusYgyH1r9CEkvMnF7KgzYHEoAJljlZfsw1mCKPWU3Zs6nEtb080Xy9n/pozUyL9 pQh24DCztFnOQb61lKuoNqs/KO28etKTOIS28tBrgS1FPZbq8LhNwShQ7O/tNESrWPt+ KfEV+OnyPW4ymaoVUkpkpZobQM5mqPn59udzYbqc0KhRaffvddiyCiUSrdVAz5VNRN/K 6pRIH9qKwOveSFfP/SrUHVrV7t7Mm9htcsstT4wycgfwloWG86Q84Y0jvcL8SZ+mqR+1 105vtvVTEVIAPPXuKryeJkO1UspucHvmCWISpUznoWH3JebhcM1SuYp0WTMAjUjYfj6o 0ujw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467274; x=1717072074; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GadSyPT1OgvfhZcStRuzlfjyEXtAsrXTzSXHTS9Z1VY=; b=C7VerVv3EKzpmviuVDAwIwwWAdlAd69v86W87COEzdBa0y9HkwiYiWpFtIYn5Myy24 KXbLbIdNvhcpQmNwgx/FD2O3j92t7qnFnIf9GpwcZbAOGy0wBDOTXlULbdy49TduIM0z tBDE/D3wNcYjJNmsu+vk4MP8FImpzwXVdxFqrABFjO9Gt7/WovkoJgk7TFFeOK+0Jam5 sJ1l0Qx75DG1l+Iy3elP8EU35xatLS9q3os5Wh8IYwMXHoaAOsuqBJuLG+wf1iNOmt6N xiOFaIJs/MeiPQC3fbCwrGxGmJqaV0IwqwF/IARoPRJ+nJq8oYYcx3MPwUny2SfM/2lm 5WFg== X-Gm-Message-State: AOJu0YznRdET+ICDayJXQF58wEFuExHpxTzLVmIZ2NbRv52SAJccnn7W 0tgPVuwTj+2uB31w0mp3KbPv7qyjC8PzENx+zo+Py3gmxOFxneBiBpZoUA== X-Received: by 2002:a17:902:cec9:b0:1f3:3b53:ac4d with SMTP id d9443c01a7336-1f33b53acb0mr24998275ad.21.1716467273658; Thu, 23 May 2024 05:27:53 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:53 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:16 -0300 Message-ID: <20240523122716.2158-5-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] x86/vvc_sad: reindent after the previous changes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rwFdedtqpbu9 Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index 9881b1180d..14f7ce230e 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -81,7 +81,7 @@ DEFINE_ARGS src1, src2, dx, off1, off2, block_h mova m4, [pw_1] %endif - .loop_height: + .loop_height: movu m0, [src1q] movu m1, [src2q] MIN_MAX_SAD m1, m0, m2 @@ -94,8 +94,8 @@ DEFINE_ARGS src1, src2, dx, off1, off2, block_h sub block_hd, 2 jg .loop_height - HORIZ_ADD xm0, xm3, m3 - movd eax, xm0 + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 RET %endmacro @@ -129,13 +129,13 @@ DEFINE_ARGS src1, src2, dx, off1, block_w, block_h, off2 mova m4, [pw_1] %endif - shl block_wd, 1 - add src1q, block_wq - add src2q, block_wq - neg block_wq + shl block_wd, 1 + add src1q, block_wq + add src2q, block_wq + neg block_wq DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx - .loop_height: + .loop_height: mov row_idxq, block_wq .loop_width: @@ -154,8 +154,8 @@ DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx sub block_hd, 2 jg .loop_height - HORIZ_ADD xm0, xm3, m3 - movd eax, xm0 + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 RET %endmacro