From patchwork Mon May 20 00:37:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stone Chen X-Patchwork-Id: 49045 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp4140524pzb; Sun, 19 May 2024 17:45:05 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWSkT8zEY9OdhELyMl3/CfQqjS8N8FFkkywtnAEKQo6lCT1LSWlgX4nhobGxz3sVnDdm1JOk84rwaM59bjjZTMuEO8ImGzZ33DnlQ== X-Google-Smtp-Source: AGHT+IGZMZ8WgOndtFWYRPTCsRPQe5Dipiool+L0mAEZafNORTgs9VUGHHlftQ1lCtglEk049r1s X-Received: by 2002:adf:eed1:0:b0:34d:7201:460a with SMTP id ffacd0b85a97d-3504a10f308mr18868297f8f.0.1716165904851; Sun, 19 May 2024 17:45:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716165904; cv=none; d=google.com; s=arc-20160816; b=YBVxTZNlRhM/cJ39oY1g9ew1j353buqdtsiCi96YZo/O93gcU6Ju86//KitC5aUYZU WQ/LE6vy+y7VD773AsqQNwl5yaKO6aeDsJbmhMpCgSuO0jYOfGagG2DIdWwUNbw+w31Y Su5KrlMg00Itbu1y3cReDAbq5oNlOW2e4epvFrla1TXTkF41sy/GBZFVQHxA+E3fWjVL SwVC2YFVpD2ihAtDx+fsm/3atGpoXXXh/nhLnSJB1zBmfpXl7j4vxAE9knXH+wyvm9Hx Xp95HORYXRBTf45jZQSXJd3FMygpKqfniHbU8aIS4XONP5cImv9S8pQprF6vI6qQCvq0 l2Tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=EfCIvuhRRdAUValFtiEWp1Yc0I39OSwXLsVqxvWvWxI=; fh=eXL0VfxTysmJvjluz5gzyadDwRdHpiPfPZqJ5KBw8Sc=; b=zYjzLRNnNxElleMgDjrsXprLlplmodglsom3w/rtG0STSWKp2eDXNoLpWBstmOhG0l lv/KBfHM84+zp1qu77/gEpkPVnwjoOt7QOlnRmiuXZM22kjR7U7Hu5Kqf5fsqgeNcJWX FM4pX+Sexxv+ahOF7jlJ1J054XH7RJCY0m1NBxR8yisSVODbKupXW0luCrX3X7/qliJi kn3+KpqJjSR68PqbOHUpSUT1EjjfM6EaKj0v4Za8NzWuNjvAr54iD1IETE4tceitWFXT 3AgmeQOi2coc2wobrwpi/RIvGp/W1jK8TcpY4B0bcTzsBSydFTSyBkgZVkw5U2zcsoBo rwpA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=MEG5MadE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57814bc9800si163983a12.43.2024.05.19.17.45.04; Sun, 19 May 2024 17:45:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=MEG5MadE; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 15D7068CF80; Mon, 20 May 2024 03:38:44 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3F7BD68CCBC for ; Mon, 20 May 2024 03:38:37 +0300 (EEST) Received: by mail-qv1-f45.google.com with SMTP id 6a1803df08f44-6a8d467aa23so8328736d6.2 for ; Sun, 19 May 2024 17:38:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716165515; x=1716770315; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=2gXnO+6VrmdlPnM1xQhroQdqt3kAFbn/KGuPrROUldk=; b=MEG5MadErPlVvgEXNhfpDdjQIbqJwmCrhMDAgM+qHxIDNVZMNnxWEH9pMDumjfQiIn h+lFnZID+WHg14WcgmGTte+ye1/PnvCziwHjS2dKy1HqN6a5Dc1wWav+NyI+u8Dc0bLr 6xNSle+AroJgudyzL6YkKyCcui4gYhm7BHtsQ/T6Ak8hZNooqugmpirHPJN9ajb2IKgX tRjRjmkuoA86EIE4khB/mQOYu/Uo8rgf5//Xb0erqOhKdX2KqJBRglAMdTIXjCMPEYyV bro43i/0X4p31MmLdVLvo6tKl3cWH/VLzMxUaRh3Cxa4O0SxsA2Ws0lfcP2CiVsfabj8 PbCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716165515; x=1716770315; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2gXnO+6VrmdlPnM1xQhroQdqt3kAFbn/KGuPrROUldk=; b=kpa9FIpy9K9q7YFQZmW7Q0MNcGQJMCV00Ma2BBhQzBvtlEkpR2IzfSX76Ny2eqVufR vN6xuYT8oJN5WshBgz8CN6Ofafaz6A7sdDrCl2Qk386HH1zEERzL/JnDy28ysH4Qwhs2 t1D2iXMGblpAdNrgJkNPD+89SXSKjxP64FkHpp08UcMkh3P7uKyalQmwsrjZACjK8K4W 4QEHUapd/0gYuF2M4WAy8TM0QgJ95y0ORXPLXG1HtNGi26hEkVlTxYbV9PS6o4O04kJN Ze4XlmGkacup0Y2zlQWdABnpoC8AElzNa+ggYeVs8JU2eu0rBcW+2Akvqu26LwAiedCS F8pw== X-Gm-Message-State: AOJu0YzvWskq5ONxThf8z2sIDUSF8IaD8iMYrTSj1NB4yinGGrVVgc67 ZhKCiLUZWPu4bqTlTpZ5swwKFi7v2W1s65tOK+aahDQvvYtL3Im2B/sovT56 X-Received: by 2002:a05:6214:3bc1:b0:6a0:919d:e3e4 with SMTP id 6a1803df08f44-6a16814f7dfmr251258386d6.21.1716165514858; Sun, 19 May 2024 17:38:34 -0700 (PDT) Received: from fedora.tailc94c2.ts.net (209-6-133-125.s1659.c3-0.bkl-cbr1.sbo-bkl.ma.cable.rcncustomer.com. [209.6.133.125]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6a35d0bfeabsm44625946d6.2.2024.05.19.17.38.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 May 2024 17:38:34 -0700 (PDT) From: Stone Chen To: ffmpeg-devel@ffmpeg.org Date: Sun, 19 May 2024 20:37:34 -0400 Message-ID: <20240520003737.10603-3-chen.stonechen@gmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Stone Chen Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ow9ttmZoV/UE Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub. Benchmarks ( AMD 7940HS ) Before: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 | Chimera_8bit_1080P_1000_frames.vvc | 204.3 | NovosobornayaSquare_1920x1080.bin | 197.3 | RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 | After: BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 | Chimera_8bit_1080P_1000_frames.vvc | 216.0 | NovosobornayaSquare_1920x1080.bin | 204.0| RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 | --- libavcodec/x86/vvc/Makefile | 3 +- libavcodec/x86/vvc/vvc_sad.asm | 138 +++++++++++++++++++++++++++++++ libavcodec/x86/vvc/vvcdsp_init.c | 6 ++ 3 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 libavcodec/x86/vvc/vvc_sad.asm diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index d6a66f860a..7b2438ce17 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -5,4 +5,5 @@ OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ x86/h26x/h2656dsp.o X86ASM-OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvc_alf.o \ x86/vvc/vvc_mc.o \ - x86/h26x/h2656_inter.o + x86/vvc/vvc_sad.o \ + x86/h26x/h2656_inter.o diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm new file mode 100644 index 0000000000..58a24635d2 --- /dev/null +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -0,0 +1,138 @@ +; /* +; * Provide SIMD DMVR SAD functions for VVC decoding +; * +; * Copyright (c) 2024 Stone Chen +; * +; * This file is part of FFmpeg. +; * +; * FFmpeg is free software; you can redistribute it and/or +; * modify it under the terms of the GNU Lesser General Public +; * License as published by the Free Software Foundation; either +; * version 2.1 of the License, or (at your option) any later version. +; * +; * FFmpeg is distributed in the hope that it will be useful, +; * but WITHOUT ANY WARRANTY; without even the implied warranty of +; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +; * Lesser General Public License for more details. +; * +; * You should have received a copy of the GNU Lesser General Public +; * License along with FFmpeg; if not, write to the Free Software +; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +; */ + +%include "libavutil/x86/x86util.asm" +%define MAX_PB_SIZE 128 +%define ROWS 2 + +SECTION_RODATA + +pw_1: dw 1 + +; DMVR SAD is only calculated on even rows to reduce complexity +SECTION .text + +%macro MIN_MAX_SAD 3 ; + pminuw %3, %2, %1 + pmaxuw %1, %2, %1 + psubusw %1, %1, %3 +%endmacro + +%macro HORIZ_ADD 3 ; xm0, xm1, m1 + vextracti128 %1, %3, q0001 ; 3 2 1 0 + paddd %1, %2 ; xm0 (7 + 3) (6 + 2) (5 + 1) (4 + 0) + pshufd %2, %1, q0032 ; xm1 - - (7 + 3) (6 + 2) + paddd %1, %1, %2 ; xm0 _ _ (5 1 7 3) (4 0 6 2) + pshufd %2, %1, q0001 ; xm1 _ _ (5 1 7 3) (5 1 7 3) + paddd %1, %1, %2 ; (01234567) +%endmacro + +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2 + sub %3, 2 + sub %4, 2 + + mov %5, 2 + mov %6, 2 + + add %5, %4 + sub %6, %4 + + imul %5, 128 + imul %6, 128 + + add %5, 2 + add %6, 2 + + add %5, %3 + sub %6, %3 + + lea %1, [%1 + %5 * 2] + lea %2, [%2 + %6 * 2] +%endmacro + +%if ARCH_X86_64 +%if HAVE_AVX2_EXTERNAL + +INIT_YMM avx2 + +cglobal vvc_sad, 6, 11, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx, dx2, dy2 + movsxd dx2q, dxd + movsxd dy2q, dyd + INIT_OFFSET src1q, src2q, dx2q, dy2q, off1q, off2q + pxor m3, m3 + vpbroadcastw m4, [pw_1] + + cmp block_wd, 16 + jge vvc_sad_16_128 + + vvc_sad_8: + .loop_height: + movu xm0, [src1q] + vinserti128 m0, [src1q + MAX_PB_SIZE * ROWS * 2], 1 + movu xm1, [src2q] + vinserti128 m1, [src2q + MAX_PB_SIZE * ROWS * 2], 1 + + MIN_MAX_SAD m1, m0, m2 + pmaddwd m1, m4 + paddd m3, m1 + + add src1q, 2 * MAX_PB_SIZE * ROWS * 2 + add src2q, 2 * MAX_PB_SIZE * ROWS * 2 + + sub block_hd, 4 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + RET + + vvc_sad_16_128: + .loop_height: + mov off1q, src1q + mov off2q, src2q + mov row_idxd, block_wd + sar row_idxd, 4 + + .loop_width: + movu m0, [src1q] + movu m1, [src2q] + MIN_MAX_SAD m1, m0, m2 + pmaddwd m1, m4 + paddd m3, m1 + + add src1q, 32 + add src2q, 32 + dec row_idxd + jg .loop_width + + lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] + lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + + sub block_hd, 2 + jg .loop_height + + HORIZ_ADD xm0, xm3, m3 + movd eax, xm0 + RET + +%endif +%endif diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c index 0e68971b2c..4b4a2aa937 100644 --- a/libavcodec/x86/vvc/vvcdsp_init.c +++ b/libavcodec/x86/vvc/vvcdsp_init.c @@ -311,6 +311,9 @@ ALF_FUNCS(16, 12, avx2) c->alf.filter[CHROMA] = ff_vvc_alf_filter_chroma_##bd##_avx2; \ c->alf.classify = ff_vvc_alf_classify_##bd##_avx2; \ } while (0) + +int ff_vvc_sad_avx2(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); +#define SAD_INIT() c->inter.sad = ff_vvc_sad_avx2 #endif void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) @@ -327,6 +330,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) ALF_INIT(8); AVG_INIT(8, avx2); MC_LINKS_AVX2(8); + SAD_INIT(); } break; case 10: @@ -338,6 +342,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(10, avx2); MC_LINKS_AVX2(10); MC_LINKS_16BPC_AVX2(10); + SAD_INIT(); } break; case 12: @@ -349,6 +354,7 @@ void ff_vvc_dsp_init_x86(VVCDSPContext *const c, const int bd) AVG_INIT(12, avx2); MC_LINKS_AVX2(12); MC_LINKS_16BPC_AVX2(12); + SAD_INIT(); } break; default: