From patchwork Tue May 14 20:40:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stone Chen X-Patchwork-Id: 48886 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp1201857pzb; Tue, 14 May 2024 13:40:58 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWMTyuNZOktMFqrahAJeIrRFI0Vp6S8uv9Yh9wDewyjMRt1ogis98sz3eYPxvirfaBMgyqkJLAP+YY97EktklT1ruhcwtMo9rh05Q== X-Google-Smtp-Source: AGHT+IHRVi1oXceN4Pxp9y1WZOqECEajqT/c0idmo//9iWQvx5j2itlzB74hnHZa5hlRKctKcRCg X-Received: by 2002:a2e:2a83:0:b0:2e2:2791:9842 with SMTP id 38308e7fff4ca-2e5204ac519mr82785451fa.44.1715719257806; Tue, 14 May 2024 13:40:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715719257; cv=none; d=google.com; s=arc-20160816; b=Ul6HoUeGJmz40iiVBaPujljaB/rOVbBIQApUdPxQEgly1VmgVWFA4bDn4epBvHnhfj vBKouuQm/bCw24uBmAGpkcPGu2b6NpUF/ShvqhERL4HR0fP0Y23+gilSFhRdzbVQNh8A 1XqbFqRO+R14r1lTctKgei0dvpgk6NzcURVsHEAF96AsnvShGabiVQcrf5rl6iZCWTlI QkI26W9ubXYnRBa7kCoORBH3LsLfqv5t4mPq7lKgoSDEtvAKxS73/eD5BbeaWQidq20C 42Xb78HvZurcz9AepZX2TXNuAQCH0VjsFlSm+rQ3LUoxtg6geBfOYGN6Z7/eTs6+E4QC CvkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=H1pJeMXWqm7kco358Y62w/SR8ClI0RLcBbdqUeo3cx8=; fh=eXL0VfxTysmJvjluz5gzyadDwRdHpiPfPZqJ5KBw8Sc=; b=RRjAk6LuzDfuZZyb0Be5quR5QVHmSIjIlOMUSYYFW+Uc+JzOItjmb2RIVcjn96Ri3D W6ZFxIfVeXjIuc8KxNGXQ49KABw7cccFTLml6rY1yGym6b1aa0OKIotoLLDdnmeal2V+ tdO0FdkUGiy6RCzcHSR5iwK8zWPZmkSXjjD1uMTk0uiCHBzjGkKSmEAf+HRpPfGYmjZz F5Djd78/hBjL20OsKDqnRhgrgEZpRWcTXj9BryQBJ82Owe2AVCdMm6AlWjsi50+A8kGV 35mejF1dJUYsWFOCc5+4V0HOhTGlSM0Swe79Eq1vCndqJIQ/UCPOFy9VKLXXBNO2nXut Ehfw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hUHo54j3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2e4d183f4b1si38344471fa.489.2024.05.14.13.40.56; Tue, 14 May 2024 13:40:57 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=hUHo54j3; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8442968D74F; Tue, 14 May 2024 23:40:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id AD66268D730 for ; Tue, 14 May 2024 23:40:45 +0300 (EEST) Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-43df44ef3e3so23710201cf.2 for ; Tue, 14 May 2024 13:40:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715719244; x=1716324044; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=9/fIvo9DvctT7tIKMJTlddmlGaeOW/bEdDahVYIedHo=; b=hUHo54j3XtzyjPMPPZ1j4GUsB2G4+cUKW73VUJPZyavKG9sn0g0AFyfCuMr97rXBxY sW2mPxRcG0XySg/ulrIlzPwUcTCI7xPof0BSfsaUs1YP88vyRwdwGaBShx20GKBreuBp IR8eIC3jLQRqYBFOUwRRpoqcMsrk//QTb4XFT/0QZ0YZWYBQ58+mR3wdVT1D2Qvxtnm6 3+Ty/9Kg1eMpuHaBwdr5+8Aej4bQlbOh2MrPu3ashVCHY/6/3JvIa+KdWGPUamwMmMUy ejnSwb8NvR370Hx0DFBJKYNw+vTF1fuI7S7lEHkkIIjgljkPDsW/HixBG05cAqVzbkSU lNWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715719244; x=1716324044; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9/fIvo9DvctT7tIKMJTlddmlGaeOW/bEdDahVYIedHo=; b=ayeMjJnJlOatfkILmlJ6O5RxfMLc//boRBCZ5UOestrSj25oZFZXlvG36b+R6hqZ7Y kLNrwBVcLGONI0xIOjSs1KA68na+simpHgKSyCY8205rtbv1GvSnVeJzw88OJXy6O+k5 TRFyrA4ljxyf9Ny+fPugOIeRIpkIXjbTAglCdTFemSO7gf4BIxIxl20sC/cbkzEeldhR knZhG44o0BlcRm4+heU6fLnaO4TWhNbp1RtPLG9sDGt48yk5IIfUPKkqYWbIK0bCAINy xcae9Tvkn4+ojVbq6al7Pbr6tC/LSxf1l3j1kkEe299NocOcJQf01l2F11FbRcE2V8w8 OxBQ== X-Gm-Message-State: AOJu0YwkgwlqYMaqij87IhKtNptbdO8fkUg6I0F6XvaWJw1XrKWPnWsA 2++SF0ckXb6vX991BUno5b+cPD/nQKhYFjv1qIi7Hp7ikIayc9avft3+SdfU X-Received: by 2002:a05:622a:316:b0:43d:85ae:bf2 with SMTP id d75a77b69052e-43dfdaabf82mr164817841cf.21.1715719243651; Tue, 14 May 2024 13:40:43 -0700 (PDT) Received: from fedora.tailc94c2.ts.net (209-6-133-125.s1659.c3-0.bkl-cbr1.sbo-bkl.ma.cable.rcncustomer.com. [209.6.133.125]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43df77e0c77sm70868931cf.96.2024.05.14.13.40.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 May 2024 13:40:43 -0700 (PDT) From: Stone Chen To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 May 2024 16:40:09 -0400 Message-ID: <20240514204019.11022-2-chen.stonechen@gmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Stone Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: J2569TRWt3w0 Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub. Benchmarks ( AMD 7940HS ) Before: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 80.7 | Chimera_8bit_1080P_1000_frames.vvc | 158.0 | NovosobornayaSquare_1920x1080.bin | 159.7 | RitualDance_1920x1080_60_10_420_37_RA.266 | 146.3 | After: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 82.7 | Chimera_8bit_1080P_1000_frames.vvc | 167.0 | NovosobornayaSquare_1920x1080.bin | 166.3 | RitualDance_1920x1080_60_10_420_37_RA.266 | 154.0 | --- libavcodec/x86/vvc/Makefile | 3 +- libavcodec/x86/vvc/vvc_sad.asm | 157 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 6 ++ 3 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 libavcodec/x86/vvc/vvc_sad.asm diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index d6a66f860a..7b2438ce17 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -5,4 +5,5 @@ OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ x86/h26x/h2656dsp.o X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ x86/vvc/vvc_mc.o \ - x86/h26x/h2656_inter.o + x86/vvc/vvc_sad.o \ + x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm new file mode 100644 index 0000000000..530142ad35 --- /dev/null +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -0,0 +1,157 @@ +; /* +; * Provide SIMD DMVR SAD functions for VVC decoding +; * +; * Copyright (c) 2024 Stone Chen +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ + +%include "libavutil/x86/x86util.asm" + +%define MAX_PB_SIZE 128 +%define ROWS 2 ; DMVR SAD is only calculated on even rows to reduce complexity + +SECTION .text + +%macro MIN_MAX_SAD 3 ; + vpminuw %1, %2, %3 + vpmaxuw %3, %2, %3 + vpsubusw %3, %3, %1 +%endmacro + +%macro HORIZ_ADD 3 ; xm0, xm1, m1 + vextracti128 %1, %3, q0001 ; 3 2 1 0 + vpaddd %1, %2 ; xm0 (7 + 3) (6 + 2) (5 + 1) (4 + 0) + vpshufd %2, %1, q0032 ; xm1 - - (7 + 3) (6 + 2) + vpaddd %1, %1, %2 ; xm0 _ _ (5 1 7 3) (4 0 6 2) + vpshufd %2, %1, q0001 ; xm1 _ _ (5 1 7 3) (5 1 7 3) + vpaddd %1, %1, %2 ; (01234567) +%endmacro + +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2 + sub %3, 2 + sub %4, 2 + + mov %5, 2 + mov %6, 2 + + add %5, %4 + sub %6, %4 + + imul %5, 128 + imul %6, 128 + + add %5, 2 + add %6, 2 + + add %5, %3 + sub %6, %3 + + lea %1, [%1 + %5 * 2] + lea %2, [%2 + %6 * 2] +%endmacro + +%if ARCH_X86_64 +%if HAVE_AVX2_EXTERNAL + +INIT_YMM avx2 + +cglobal vvc_sad, 6, 11, 14, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx, dx2, dy2 + movsxd dx2q, dxd + movsxd dy2q, dyd + INIT_OFFSET src1q, src2q, dx2q, dy2q, off1q, off2q + pxor m3, m3 + pxor m8, m8 + + cmp block_wd, 16 + jge vvc_sad_16_128 + + vvc_sad_8: + .loop_height: + movu xm0, [src1q] + movu xm1, [src2q] + MIN_MAX_SAD xm2, xm0, xm1 + vpmovzxwd m1, xm1 + vpaddd m3, m1 + + movu xm5, [src1q + MAX_PB_SIZE * ROWS * 2] + movu xm6, [src2q + MAX_PB_SIZE * ROWS * 2] + MIN_MAX_SAD xm7, xm5, xm6 + vpmovzxwd m6, xm6 + vpaddd m3, m6 + + movu xm8, [src1q + MAX_PB_SIZE * 2 * ROWS * 2] + movu xm9, [src2q + MAX_PB_SIZE * 2 * ROWS * 2] + MIN_MAX_SAD xm10, xm8, xm9 + vpmovzxwd m9, xm9 + vpaddd m3, m9 + + movu xm11, [src1q + MAX_PB_SIZE * 3 * ROWS * 2] + movu xm12, [src2q + MAX_PB_SIZE * 3 * ROWS * 2] + MIN_MAX_SAD xm13, xm11, xm12 + vpmovzxwd m12, xm12 + + vpaddd m3, m12 + + add src1q, MAX_PB_SIZE * 4 * ROWS * 2 + add src2q, MAX_PB_SIZE * 4 * ROWS * 2 + + sub block_hd, 8 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + RET + + vvc_sad_16_128: + .loop_height: + mov off1q, src1q + mov off2q, src2q + mov row_idxd, block_wd + sar row_idxd, 4 + + .loop_width: + movu xm0, [src1q] + movu xm1, [src2q] + MIN_MAX_SAD xm2, xm0, xm1 + vpmovzxwd m1, xm1 + vpaddd m3, m1 + + movu xm5, [src1q + 16] + movu xm6, [src2q + 16] + MIN_MAX_SAD xm7, xm5, xm6 + vpmovzxwd m6, xm6 + vpaddd m3, m6 + + add src1q, 32 + add src2q, 32 + dec row_idxd + jg .loop_width + + lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] + lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + + sub block_hd, 2 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + + RET + +%endif +%endif diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 0e68971b2c..4b4a2aa937 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -311,6 +311,9 @@ ALF_FUNCS(16, 12, avx2) c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) + +int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); +#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -327,6 +330,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + SAD_INIT(); } break; case 10: @@ -338,6 +342,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + SAD_INIT(); } break; case 12: @@ -349,6 +354,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + SAD_INIT(); } break; default: