From patchwork Thu May 23 12:27:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Almer X-Patchwork-Id: 49175 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp1004980vqf; Thu, 23 May 2024 05:28:01 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW32dMesbNYBtoieuAYShEelQiSMNNx4z3Jh1MPaXwgJRS7z+wedx/evOHF1uRPhLSrvKWzD8J72RF3zweFgO7oc6nvfWklA7UUHA== X-Google-Smtp-Source: AGHT+IEnK7GNYYvXJNT/3VRCRBK22j+o/IoKw1uEs2HB730SXm0pvfoOBrBUsUMhmYuI6UG0cYQz X-Received: by 2002:a17:906:7fcb:b0:a5a:89cf:489a with SMTP id a640c23a62f3a-a62281a3776mr353694366b.4.1716467280875; Thu, 23 May 2024 05:28:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716467280; cv=none; d=google.com; s=arc-20160816; b=Nv2tDLygzXeTaKdY79iVIi6uRucqdoNKv5hFUgEUsxYekqolo+DuXrfSsnNpLeOo5g pUjLzPxIB72msrwz8C42hI3OKrudxObuUCfZN7bQ3zoAEjnvm2EZSKgtrzv0q752ycEz J4Ny5Q6V0Vfq6rqWyw9h+iJLx2SjwiWSkXWHLBBjyEziGopS8Nt81Ssq9V1fznaJVVGn EZSlr4YdY8/AhmTe0tHPJ4q7flfylNCBdCdOZ03JgtZWZLxK3FMphoss8Qd54gg1XQjH 62BHGaTbqYxpY4Qb0vI2tmYFN3/tvcJtFfRyAenzEfPr6IAT37wfXF+hIBKCtdcugQpb m9fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=bqm+7EnH6EP6ycFggyj1RVN7L5uTEETHadv6JciH1To=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=q+R5afQM8MwYEyqV+xUXIZXH4s4n+9TtdfNnSFeMbJu+FwW1K9oWncu+wqxqZFOK+E c9mCuMG49cfouOUzvMVbTN8dzJOli1v8mgD0k+gcl66YEk3VVyDtsTxQJxlh8QPKkDcM fTfwiw4bUNDAyZMh8YKslnlpfQWv6zEeooVXK8m+fvg96BGut9OwXyKwTLtRaoY2pHjh okPJaQtO7Ie7m9+lR1B9t4gt6Ynlm6b8Kxt5uIQg9CmzDkcN2TUSi0N0h6kjcT8h/zw0 zZEa10Gi1XQ7FnuiqjVE3enbjvM1HtAAtgu9pDcA/gmmuX+iCC4q9pK+PuFEaJp00ybD 0Ceg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=F6LZfh75; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5d0cc81535si735913166b.570.2024.05.23.05.28.00; Thu, 23 May 2024 05:28:00 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=F6LZfh75; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD26568D355; Thu, 23 May 2024 15:27:56 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id ECA8E68D355 for ; Thu, 23 May 2024 15:27:49 +0300 (EEST) Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6f4521ad6c0so2280974b3a.0 for ; Thu, 23 May 2024 05:27:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467267; x=1717072067; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yqaYh32wH6nemsMr08q+YjAfyH3YMfMKFGKya8j8xJ0=; b=F6LZfh75DfNetvW4sHUQQLeFHgrTtyo2QxA5KmChfhF+W4sMQN76Ii4oEGd0UPpPpb 8XuSOse4ALoWrHFAeT74DEdFlLWqu97kryzcJkWyKctVuof9DcOpoZjsRu1pdxikRwnc oyUUBtSwTGeLqCXDPhYFC2NEdNAOQSjzWa89Gq704g+kdFkxuaZeC3dgW+z9pwZ5zspi NHfn/CmohdfLd25AfaMk63BL2ahtRscnv2hI2ky4vLlUMSrxx6hrYgrxMYm1Wv6pWwMB 3yEphIbYUsU7p2AtmKkJwgzQInY4pfotNxzUkIFCwU1z8/Wn3vW5V0Gz7eysOI8fOqI0 Ivzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467267; x=1717072067; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yqaYh32wH6nemsMr08q+YjAfyH3YMfMKFGKya8j8xJ0=; b=mVc1wTBjksYOfKrecro7nTvukevj6tk8xB1uZhedzKH+E8Eg4yUKU0z/O07qCV+QxG nPuaCHLK6YhxofU8R1FpqiPZE2RaAJeZ+v81oZ3nsZX4e+T/C+qjXOMeAWH67nj6cD4c yYn569XQnNqSVqrjiY2/h11sd72MaPuPmZMFHCqBMNAtemCtWYSYioqtvo37aWQxBeqg Nje+cvUprfgmXkXL78q/Fhw72lmBjvH71ceeTBfnnt1ChC/ve8xjxnDfzDW9YLUv3Kjl kJU5d/g7e2A7ahyli4/Gl7KWM5QcHvnmooI7j5lClgz/GQM2heDz/hpFWz+gUv9EuvJ/ XMnQ== X-Gm-Message-State: AOJu0YxifNtYqbzkztSxOZ3aHbBoVDc7MOGR8uLlkYGX5blLomXY7rLb /eRMqC9eP53aCSIX/B21rnamqMDj6QvbOznFdlrCQdjk50hhBixMng608w== X-Received: by 2002:a05:6a20:7489:b0:1ad:7bfd:54a1 with SMTP id adf61e73a8af0-1b1f879cf62mr6184833637.17.1716467267032; Thu, 23 May 2024 05:27:47 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:46 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:12 -0300 Message-ID: <20240523122716.2158-1-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/5] avcodec/vvc_mc: split the SAD dsp prototype into one function per blocksize width X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 0AONoo7cntTG Signed-off-by: James Almer --- libavcodec/vvc/dsp.h | 2 +- libavcodec/vvc/inter.c | 6 ++++-- libavcodec/vvc/inter_template.c | 6 +++++- libavcodec/x86/vvc/vvc_sad.asm | 32 ++++++++++++++++++++++++++------ libavcodec/x86/vvc/vvcdsp_init.c | 22 +++++++++++++++++----- tests/checkasm/vvc_mc.c | 3 ++- 6 files changed, 55 insertions(+), 16 deletions(-) diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 1f14096c41..55c4c81f53 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -99,7 +99,7 @@ typedef struct VVCInterDSPContext { void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); - int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); + int (*sad[5])(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, intptr_t mx, intptr_t my, int width); } VVCInterDSPContext; diff --git a/libavcodec/vvc/inter.c b/libavcodec/vvc/inter.c index e1011b4fa1..0214e46634 100644 --- a/libavcodec/vvc/inter.c +++ b/libavcodec/vvc/inter.c @@ -740,6 +740,8 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, const AVFrame *ref0, const AVFrame *ref1, const int x_off, const int y_off, const int block_w, const int block_h) { const VVCFrameContext *fc = lc->fc; + static const uint8_t sad_tab[16] = { 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4 }; + const int tab = sad_tab[(FFALIGN(block_w, 8) >> 3) - 1]; const int sr_range = 2; const AVFrame *ref[] = { ref0, ref1 }; int16_t *tmp[] = { lc->tmp, lc->tmp1 }; @@ -763,7 +765,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, fc->vvcdsp.inter.dmvr[!!my][!!mx](tmp[i], src, src_stride, pred_h, mx, my, pred_w); } - min_sad = fc->vvcdsp.inter.sad(tmp[L0], tmp[L1], dx, dy, block_w, block_h); + min_sad = fc->vvcdsp.inter.sad[tab](tmp[L0], tmp[L1], dx, dy, block_w, block_h); min_sad -= min_sad >> 2; sad[dy][dx] = min_sad; @@ -773,7 +775,7 @@ static void dmvr_mv_refine(VVCLocalContext *lc, MvField *mvf, MvField *orig_mv, for (dy = 0; dy < SAD_ARRAY_SIZE; dy++) { for (dx = 0; dx < SAD_ARRAY_SIZE; dx++) { if (dx != sr_range || dy != sr_range) { - sad[dy][dx] = fc->vvcdsp.inter.sad(lc->tmp, lc->tmp1, dx, dy, block_w, block_h); + sad[dy][dx] = fc->vvcdsp.inter.sad[tab](lc->tmp, lc->tmp1, dx, dy, block_w, block_h); if (sad[dy][dx] < min_sad) { min_sad = sad[dy][dx]; min_dx = dx; diff --git a/libavcodec/vvc/inter_template.c b/libavcodec/vvc/inter_template.c index a8068f4ba8..34485321d3 100644 --- a/libavcodec/vvc/inter_template.c +++ b/libavcodec/vvc/inter_template.c @@ -626,7 +626,11 @@ static void FUNC(ff_vvc_inter_dsp_init)(VVCInterDSPContext *const inter) inter->apply_prof_uni_w = FUNC(apply_prof_uni_w); inter->apply_bdof = FUNC(apply_bdof); inter->prof_grad_filter = FUNC(prof_grad_filter); - inter->sad = vvc_sad; + inter->sad[0] = + inter->sad[1] = + inter->sad[2] = + inter->sad[3] = + inter->sad[4] = vvc_sad; } #undef FUNCS diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index b468d89ac2..a20818530f 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -51,7 +51,7 @@ SECTION .text INIT_YMM avx2 -cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -76,10 +76,6 @@ cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_ pxor m3, m3 vpbroadcastd m4, [pw_1] - cmp block_wd, 16 - jge vvc_sad_16_128 - - vvc_sad_8: .loop_height: movu xm0, [src1q] vinserti128 m0, m0, [src1q + MAX_PB_SIZE * ROWS * 2], 1 @@ -100,7 +96,31 @@ cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_ movd eax, xm0 RET - vvc_sad_16_128: +cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx + movsxdifnidn dxq, dxd + movsxdifnidn dyq, dyd + + sub dxq, 2 + sub dyq, 2 + + mov off1q, 2 + mov off2q, 2 + + add off1q, dyq + sub off2q, dyq + + shl off1q, 7 + shl off2q, 7 + + add off1q, dxq + sub off2q, dxq + + lea src1q, [src1q + off1q * 2 + 2 * 2] + lea src2q, [src2q + off2q * 2 + 2 * 2] + + pxor m3, m3 + vpbroadcastd m4, [pw_1] + sar block_wd, 4 .loop_height: mov off1q, src1q diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 4b4a2aa937..bd60963432 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -312,8 +312,20 @@ ALF_FUNCS(16, 12, avx2) c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) -int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); -#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 +#define SAD_PROTOTYPE(w, opt) \ +int bf(ff_vvc_sad, w, opt)(const int16_t *src0, const int16_t *src1, \ + int dx, int dy, int block_w, int block_h) \ + +SAD_PROTOTYPE(8, avx2); +SAD_PROTOTYPE(16, avx2); + +#define SAD_INIT(opt) do { \ + c->inter.sad[0] = ff_vvc_sad_8_##opt; \ + c->inter.sad[1] = \ + c->inter.sad[2] = \ + c->inter.sad[3] = \ + c->inter.sad[4] = ff_vvc_sad_16_##opt; \ +} while (0) #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -330,7 +342,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); - SAD_INIT(); + SAD_INIT(avx2); } break; case 10: @@ -342,7 +354,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); - SAD_INIT(); + SAD_INIT(avx2); } break; case 12: @@ -354,7 +366,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); - SAD_INIT(); + SAD_INIT(avx2); } break; default: diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c index 1e889e2cff..deae1014d2 100644 --- a/tests/checkasm/vvc_mc.c +++ b/tests/checkasm/vvc_mc.c @@ -327,6 +327,7 @@ static void check_avg(void) static void check_vvc_sad(void) { const int bit_depth = 10; + static const uint8_t sad_tab[16] = { 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4 }; VVCDSPContext c; LOCAL_ALIGNED_32(uint16_t, src0, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]); LOCAL_ALIGNED_32(uint16_t, src1, [MAX_CTU_SIZE * MAX_CTU_SIZE * 4]); @@ -341,7 +342,7 @@ static void check_vvc_sad(void) for (int w = 8; w <= MAX_CTU_SIZE; w *= 2) { for(int offy = 0; offy <= 4; offy++) { for(int offx = 0; offx <= 4; offx++) { - if(check_func(c.inter.sad, "sad_%dx%d", w, h)) { + if(check_func(c.inter.sad[sad_tab[(w >> 3) - 1]], "sad_%dx%d", w, h)) { int result0; int result1;