From patchwork Wed May 22 00:00:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stone Chen X-Patchwork-Id: 49114 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:9214:b0:1af:fc2d:ff5a with SMTP id tl20csp50231pzb; Tue, 21 May 2024 17:01:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVqHaa3mmXuFnY+TJkQQpKpaPlmXW2Y5YHd9wXDz0hjoo/L3VuIPEJKkQCGFCVZpIjqeR0aUW/F80mEun+rUZl7F7NWErAo8ig+Nw== X-Google-Smtp-Source: AGHT+IE7a9lhV31fE0oJGUpIqv/d70/RSw2isVXTEMClK0NOoGTfLUl2djrO/uzooJMkhPZ9h5Cq X-Received: by 2002:a50:cc9d:0:b0:574:fda9:883a with SMTP id 4fb4d7f45d1cf-5752b4c9285mr9332251a12.15.1716336075821; Tue, 21 May 2024 17:01:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716336075; cv=none; d=google.com; s=arc-20160816; b=D/IxW+YDZzSFSx9UkdlrkFvh9WGQj7/fIU+XZV/zjHi3unwPkzZQV7fS7XLT3ZiLTR icDBgiiTffrx89Astb0cZqxT9+lyZ1zhepsNrsYmAR9P/Y+D5C1o0EQa6wTB3/z7UnCW e8oTP9MjZPfgPaJm6AowY/IxaOHLhPi3bxe514prmIDNCiWUtU7CTebpOj174jK0Abde xyysjOVWQLvcoeoGWbj2jz7yCreGCXyz8L2HY2x+UszDw+ZVA4RgNGgvyS2Ec5aE7JDo 5TPWVl3W9UMLWjqDqV80xWsS+q37ED5l8A/U5LpYx+Jj/ItNhwjb/oGhFFcyr78PKIKh 9u2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=4X7N/eFRuS2vn0LxAfe5Mz3Ii9YVDKqV8iDWCHHH/7o=; fh=eXL0VfxTysmJvjluz5gzyadDwRdHpiPfPZqJ5KBw8Sc=; b=UNcEhqmenPMIsqJATyd1yeXBKv2Dz6NeYjA0kmxoxm8otPlkY8Q/y6pellG1KXEv3b /goVCi7Rm/0DepF6WmWyjMWan3aPictX14i8dcbxkv/gvapDFVRq+NdaPlKw/iRm5Fry TwBqewD98LCTYozkdVpwR/b3wsWEypcmWzEtrxFDVATn5pWHiwtJTJLQVGJ/xjEbffI8 0akDdGiFzbyEMwKxnkGR252LKSjg3YhsU3/ywxs38jxb4uAuO8lQ7JwyRSPLVD+PLZod L4BJSb19bEXQ62f2quUeRly+Z3c3yXjQ0XKL9rH46AkgsZPceOvd+CkG9/bKaCFYcXJV /3jQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="N/cr4c0y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57808942330si2983210a12.111.2024.05.21.17.01.15; Tue, 21 May 2024 17:01:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="N/cr4c0y"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2FFD968D396; Wed, 22 May 2024 03:01:11 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E622368CCE0 for ; Wed, 22 May 2024 03:01:03 +0300 (EEST) Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-792bcfde2baso57695285a.3 for ; Tue, 21 May 2024 17:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716336062; x=1716940862; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=pazxsVHShdOvKd7KEgdIiDn8VAXQMq0Fqp8F6nD9HsY=; b=N/cr4c0yuStHTK2ds7HyDZ2i/uTpfNwhXe0navoA0aCw+rtne8tE2xwXieU89D3tIe Umc/S13LM2MSwCyhxggvEWVNSWKCS1XnvZ5ZCe7Clvky+poWTzYPgQ2ZSdkGAmMTfbsu GHSX9PAJvz16/dm/diGbpttJTHaMnWNNA4gxeHcvfc2PDBhJMStD6VhDkNaKbeBiyTAn n7V+xI0QQN+fZ5AbbWnMOUAU2DTEsMHWEVEnPjm3vk7OCk1xRKJrwkvWSLLRfHZRKgUf ZHKNNMqlnRwH1aFiQwlLfL90hM5mXO98L+Kp8igBnIeVYqeuvuojFbSluVyzuDvK8QWI hbhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716336062; x=1716940862; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pazxsVHShdOvKd7KEgdIiDn8VAXQMq0Fqp8F6nD9HsY=; b=UaSR4XczIo9jsTuFsuFdpDt3uYFJOSX5HZhEeWtleuHUWJXyYpBd0dgvCyZHsE75Pu SHXL0sflqUpLP51u4xmc6C6me6ChWYz9QQwLTdiWaHkPFCMGiuRkgVPvC1XybOWs/fQG tOybjfBYSoHULUsmAHrt9ja4Q0KJ24wba/9U22/Utvm/uDGBD7D/FxpNVg8eCbgLECmY V3Xm4ylr94IfapIBJ5YbPY5yRft9Tid2Zl8i7jMAlM8p2sdXN78AAy3BNtdUG2l2zJKA 0uY7UBY2iwphHge3Hr0UGsoyGMJRV/fqPJiNNVbGdTgTSdUiWOY1chu4feKpsVixvMYr 1fOg== X-Gm-Message-State: AOJu0Yx3YcTtZcqVq3x/B0NOd/r4eH+7f1Z+ehxSKwj5mCnqJPyTE3zq QOssbaGA45tJAfznu7slNuhhVjkCQa8mNOFMyaJfvSululvqXoclAnFAZkci X-Received: by 2002:ac8:5f53:0:b0:43a:d7a7:7322 with SMTP id d75a77b69052e-43f9e0df860mr5807051cf.37.1716336061495; Tue, 21 May 2024 17:01:01 -0700 (PDT) Received: from fedora.tailc94c2.ts.net (209-6-133-125.s1659.c3-0.bkl-cbr1.sbo-bkl.ma.cable.rcncustomer.com. [209.6.133.125]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43e0ec47bf0sm133729061cf.3.2024.05.21.17.01.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 17:01:00 -0700 (PDT) From: Stone Chen To: ffmpeg-devel@ffmpeg.org Date: Tue, 21 May 2024 20:00:32 -0400 Message-ID: <20240522000039.34913-2-chen.stonechen@gmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v5 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Stone Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: QWJGotwFzaYo Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub. Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd. Benchmarks ( AMD 7940HS ) Before: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 | Chimera_8bit_1080P_1000_frames.vvc | 204.3 | NovosobornayaSquare_1920x1080.bin | 197.3 | RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 | After: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 | Chimera_8bit_1080P_1000_frames.vvc | 216.0 | NovosobornayaSquare_1920x1080.bin | 204.0| RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 | --- libavcodec/vvc/dsp.c | 2 +- libavcodec/vvc/dsp.h | 2 +- libavcodec/x86/vvc/Makefile | 3 +- libavcodec/x86/vvc/vvc_sad.asm | 130 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 6 ++ 5 files changed, 140 insertions(+), 3 deletions(-) create mode 100644 libavcodec/x86/vvc/vvc_sad.asm diff --git a/libavcodec/vvc/dsp.c b/libavcodec/vvc/dsp.c index 41e830a98a..aded1a2f9f 100644 --- a/libavcodec/vvc/dsp.c +++ b/libavcodec/vvc/dsp.c @@ -46,7 +46,7 @@ static void av_always_inline pad_int16(int16_t *_dst, const ptrdiff_t dst_stride memcpy(_dst, _dst - dst_stride, padded_width * sizeof(int16_t)); } -static int vvc_sad(const int16_t *src0, const int16_t *src1, int dx, int dy, +static int vvc_sad(const int16_t *src0, const int16_t *src1, intptr_t dx, intptr_t dy, const int block_w, const int block_h) { int sad = 0; diff --git a/libavcodec/vvc/dsp.h b/libavcodec/vvc/dsp.h index 9810ac314c..213337358b 100644 --- a/libavcodec/vvc/dsp.h +++ b/libavcodec/vvc/dsp.h @@ -86,7 +86,7 @@ typedef struct VVCInterDSPContext { void (*apply_bdof)(uint8_t *dst, ptrdiff_t dst_stride, int16_t *src0, int16_t *src1, int block_w, int block_h); - int (*sad)(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); + int (*sad)(const int16_t *src0, const int16_t *src1, intptr_t dx, intptr_t dy, int block_w, int block_h); void (*dmvr[2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t src_stride, int height, intptr_t mx, intptr_t my, int width); } VVCInterDSPContext; diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index d6a66f860a..7b2438ce17 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -5,4 +5,5 @@ OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ x86/h26x/h2656dsp.o X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ x86/vvc/vvc_mc.o \ - x86/h26x/h2656_inter.o + x86/vvc/vvc_sad.o \ + x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm new file mode 100644 index 0000000000..9766446b11 --- /dev/null +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -0,0 +1,130 @@ +; /* +; * Provide SIMD DMVR SAD functions for VVC decoding +; * +; * Copyright (c) 2024 Stone Chen +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ + +%include "libavutil/x86/x86util.asm" +%define MAX_PB_SIZE 128 +%define ROWS 2 + +SECTION_RODATA + +pw_1: times 2 dw 1 + +; DMVR SAD is only calculated on even rows to reduce complexity +SECTION .text + +%macro MIN_MAX_SAD 3 ; + pminuw %3, %2, %1 + pmaxuw %1, %2, %1 + psubusw %1, %1, %3 +%endmacro + +%macro HORIZ_ADD 3 ; xm0, xm1, m1 + vextracti128 %1, %3, q0001 ; 3 2 1 0 + paddd %1, %2 ; xm0 (7 + 3) (6 + 2) (5 + 1) (4 + 0) + pshufd %2, %1, q0032 ; xm1 - - (7 + 3) (6 + 2) + paddd %1, %1, %2 ; xm0 _ _ (5 1 7 3) (4 0 6 2) + pshufd %2, %1, q0001 ; xm1 _ _ (5 1 7 3) (5 1 7 3) + paddd %1, %1, %2 ; (01234567) +%endmacro + +%if ARCH_X86_64 +%if HAVE_AVX2_EXTERNAL + +INIT_YMM avx2 + +cglobal vvc_sad, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx + sub dxq, 2 + sub dyq, 2 + + mov off1q, 2 + mov off2q, 2 + + add off1q, dyq + sub off2q, dyq + + shl off1q, 7 + shl off2q, 7 + + add off1q, dxq + sub off2q, dxq + + lea src1q, [src1q + off1q * 2 + 2 * 2] + lea src2q, [src2q + off2q * 2 + 2 * 2] + + pxor m3, m3 + vpbroadcastd m4, [pw_1] + + cmp block_wd, 16 + jge vvc_sad_16_128 + + vvc_sad_8: + .loop_height: + movu xm0, [src1q] + vinserti128 m0, [src1q + MAX_PB_SIZE * ROWS * 2], 1 + movu xm1, [src2q] + vinserti128 m1, [src2q + MAX_PB_SIZE * ROWS * 2], 1 + + MIN_MAX_SAD m1, m0, m2 + pmaddwd m1, m4 + paddd m3, m1 + + add src1q, 2 * MAX_PB_SIZE * ROWS * 2 + add src2q, 2 * MAX_PB_SIZE * ROWS * 2 + + sub block_hd, 4 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + RET + + vvc_sad_16_128: + sar block_wd, 4 + .loop_height: + mov off1q, src1q + mov off2q, src2q + mov row_idxd, block_wd + + .loop_width: + movu m0, [src1q] + movu m1, [src2q] + MIN_MAX_SAD m1, m0, m2 + pmaddwd m1, m4 + paddd m3, m1 + + add src1q, 32 + add src2q, 32 + dec row_idxd + jg .loop_width + + lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] + lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + + sub block_hd, 2 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + RET + +%endif +%endif diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 0e68971b2c..aa6c916760 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -311,6 +311,9 @@ ALF_FUNCS(16, 12, avx2) c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) + +int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, intptr_t dx, intptr_t dy, int block_w, int block_h); +#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -327,6 +330,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + SAD_INIT(); } break; case 10: @@ -338,6 +342,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + SAD_INIT(); } break; case 12: @@ -349,6 +354,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + SAD_INIT(); } break; default: