From patchwork Mon Oct 28 17:08:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 52533 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:be6b:0:b0:48e:c0f8:d0de with SMTP id bd11csp1746286vqb; Mon, 28 Oct 2024 10:19:50 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVGtkDmSRlyBKMceY46K33nmHY80LkIZWibNe05xE9UJCth9qklRcslt9IBcdjmV+UbfQRPFEoFGnzYIz5QI8iO@gmail.com X-Google-Smtp-Source: AGHT+IFHcw2Nb75XseXAkXHQqhjQbpd7hX/Ugb7fay8+8lnJ/KbawXuZa+IKMmEbIMt9W7G1JGjP X-Received: by 2002:a05:6512:31d1:b0:52e:fd7c:8b9b with SMTP id 2adb3069b0e04-53b3490ea51mr1395286e87.7.1730135989727; Mon, 28 Oct 2024 10:19:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1730135989; cv=none; d=google.com; s=arc-20240605; b=ZeL0BcEoVGELStZqRg7Rg2dYwb2YrfIeeGjI5FmWhOPKTU5xDGWPeUiGVqKjZeLaou s6x/yuH4R3o0rMg507RJM2fsWS2RAmbwg7JaIrwvXcxL2wwfkUNfA/iR1R3KRoXJjvOU eJ923LjYPD7wOaqtYi/QaRJV7HVPzuXiUoWqmgPBzC3WLDBOBVtOQx8RrQKvdUDSOdZ7 46znZQcvHghL4ZyMjhNDAfQISO3Rtd9KXIASIwl0VtnEird+A0beFKeRLAbZRBFrqPQU QZVu5V6G5+XuZW91tuJUDR4TyFfYccc1x1d9ZSt8LPWxQzN7Hyk1kFL5rtnhx0Wp7hPo PWFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=95647zzzHCnQgG3nH+draZ/j8FRg5c4LuiOrb3aTvYs=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=KYYzYQ0/4WrPW4LxFKxz3eS+7W/C8OEVSgl/2S6Xl/gJLoltfnn+nnnu/iWjF8G/hB SDjIdEzPeiwu2X5WlSIF7QM6v/T5/JkSxFBYDHkwic5BWuTBtkswsSmwLmiAUUQ5EMRH pHZtPGe+1ckadSoccgGYBTJnsr/fEKXPsfaJgIHh9bgzXKJ4crcVkIV+o4dsYQq6mlmi ve/TbcKwtRrV+qkNa18fx1DTnP7wbHYS35Hi49wMjkb2f38iKpewR2Rz245yguIpGJT+ YCxm9baYfPzusCQG0Ap6ydMOKpeerANI7696ZsvIvH69UdtbpgwFRDiPJTK9F3OtxxmX xE7g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kOKJyQDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-53b2e1cd243si2592695e87.401.2024.10.28.10.19.49; Mon, 28 Oct 2024 10:19:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kOKJyQDy; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 20EFB68DC55; Mon, 28 Oct 2024 19:08:58 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-210.mail.qq.com (out203-205-221-210.mail.qq.com [203.205.221.210]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6D60468DBF7 for ; Mon, 28 Oct 2024 19:08:55 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1730135326; bh=i4/BTmlmYemU02/SnUZn9ROMjRu4s/O51QT8Ugl8L+Q=; h=From:To:Cc:Subject:Date; b=kOKJyQDyT0jZFdeyLxkZP5OCYMzX2oAfvMXAK1ztY3aonBT2iuSrYpt+eYb5s6YxB KlcIC229vD/t4XIIdAbxy2IY8XmQxTJFAlID2jMpEtT13zzQ/EPXhRz9oqvuuVhdXh KZswMFmnTizngTW+fGzzA2Q7diAlmISyJ/OQKY6s= Received: from localhost.localdomain ([42.56.221.142]) by newxmesmtplogicsvrszc16-0.qq.com (NewEsmtp) with SMTP id 22D3D01E; Tue, 29 Oct 2024 01:08:45 +0800 X-QQ-mid: xmsmtpt1730135325t5a85p9cs Message-ID: X-QQ-XMAILINFO: N3l5ASPewLWqIcnOe2icZ3aJ7gFQPp1LSNtpn5kuYwI07+1WirFBsCRbM0eXoK Xhs6l+fpcUsYsSn7DH6uPYpFtFR2UoXTMS2N6Jy9k7xf4sHH0fR0AyfzlftvNgp7Aq/0ZBK0qLXw nvRbpTC+C8CC9HioMEA9lgCfLc2VLrokE0IChhhWo1PpfzBoCWvSHvDfIWdj78OASuOgPrmOsjph AlmU9ukn28ZxJxlH0knGHYZYVem00LnE+FO1i4AhQIDa8t1kVPpoOhU0VBYb7WShGx3CaFkrFe2P sTXMv4hLw6aIaUQ/YJQSYgvzav8CqPMbilo9f/UhGwGso6n5B6ZSpiok451gG7rNIEO19xWK9CPa u1C8gjcGauauXYWbv4nQ5EJVP6xWQ65wG6RQVPyl9P6Q1XmujtGK8DYbDTrkFf+Hd10/+Nr/ayuo KIfvb2zZCFu+gqf3H5K3uN44s+ci2g0DrO1XF6UXeuOvl9cf4Q1BX83w89n+BsIHFC3TKn5L9AXr 9OCRQ9Gz4Byv95ccTqhx6md/U18oG2LDirX9W9DIhlDlZHskaePCF+1tPsV2TpA3ID4TFbMHMjC+ Hgz3+l4eRL7aHyXQlmh7NX+CD74LqajRuhgKYCAa6VDZz7jMvW8euW570p46YeWQEsp6ha6wZkom 6Ge4pT9sojymidSevrHS3Tp+fylZpcw1zKPHHzfYropEtnpj+HYJ74nDBsoywcu5X1xIWSB0RmrI KsGxSlFX1jVkIye2xe1K4wyP+h1/0Vog3I4WebLEYBGbmirjFkReAxotpnFBJZgVPHBE5nUaYXxi cvst1qwZ61iarBiWbw7tTynhpxVEHWMhu1M8ivMNVuqTJo5hpzAfJcSxp+Vq1RHLXCzUrZEJ6yg3 dpjNZESs8PjbrPpl6FIyfKrXHAdYwqaRr8qd1PEkO5sez/foNLYMoZQwnAzQoedQ== X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 29 Oct 2024 01:08:44 +0800 X-OQ-MSGID: <20241028170844.224624-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.47.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/5] lavc/vvc_mc: R-V V sad X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: eOWQMr44qe4K From: sunyuechi k230 banana_f3 sad_8x16_c: 385.9 ( 1.00x) 403.1 ( 1.00x) sad_8x16_rvv_i32: 108.1 ( 3.57x) 100.8 ( 4.00x) sad_16x8_c: 376.6 ( 1.00x) 392.6 ( 1.00x) sad_16x8_rvv_i32: 89.3 ( 4.21x) 69.5 ( 5.64x) sad_16x16_c: 746.6 ( 1.00x) 757.3 ( 1.00x) sad_16x16_rvv_i32: 135.8 ( 5.50x) 121.5 ( 6.23x) --- libavcodec/riscv/vvc/Makefile | 1 + libavcodec/riscv/vvc/vvc_sad_rvv.S | 58 ++++++++++++++++++++++++++++++ libavcodec/riscv/vvc/vvcdsp_init.c | 7 ++++ 3 files changed, 66 insertions(+) create mode 100644 libavcodec/riscv/vvc/vvc_sad_rvv.S diff --git a/libavcodec/riscv/vvc/Makefile b/libavcodec/riscv/vvc/Makefile index ec116aebc1..0778947b63 100644 --- a/libavcodec/riscv/vvc/Makefile +++ b/libavcodec/riscv/vvc/Makefile @@ -1,3 +1,4 @@ OBJS-$(CONFIG_VVC_DECODER) += riscv/vvc/vvcdsp_init.o RVV-OBJS-$(CONFIG_VVC_DECODER) += riscv/vvc/vvc_mc_rvv.o \ + riscv/vvc/vvc_sad_rvv.o \ riscv/h26x/h2656_inter_rvv.o diff --git a/libavcodec/riscv/vvc/vvc_sad_rvv.S b/libavcodec/riscv/vvc/vvc_sad_rvv.S new file mode 100644 index 0000000000..acdc78d20d --- /dev/null +++ b/libavcodec/riscv/vvc/vvc_sad_rvv.S @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavcodec/riscv/h26x/asm.S" + +.macro func_sad vlen +func ff_vvc_sad_rvv_\vlen, zve32x, zbb, zba + lpad 0 + slli t2, a3, 7 // dy * 128 + li t1, 4*128+4 + add t0, t2, a2 // dy * 128 + dx + sub t1, t1, t2 + sub t1, t1, a2 + sh1add a0, t0, a0 + sh1add a1, t1, a1 + vsetvlstatic32 1, \vlen + li t0, 16 + vmv.s.x v0, zero + beq a4, t0, SAD\vlen\()16 + .irp w,8,16 +SAD\vlen\w: + vsetvlstatic16 \w, \vlen + addi a5, a5, -2 + vle16.v v8, (a0) + vle16.v v16, (a1) + vwsub.vv v24, v8, v16 + vsetvlstatic32 \w, \vlen + vneg.v v16, v24 + addi a0, a0, 2 * 128 * 2 + vmax.vv v24, v24, v16 + vredsum.vs v0, v24, v0 + addi a1, a1, 2 * 128 * 2 + bnez a5, SAD\vlen\w + vmv.x.s a0, v0 + ret + .endr +endfunc +.endm + +func_sad 256 +func_sad 128 diff --git a/libavcodec/riscv/vvc/vvcdsp_init.c b/libavcodec/riscv/vvc/vvcdsp_init.c index 9dea70f392..fafe6e8158 100644 --- a/libavcodec/riscv/vvc/vvcdsp_init.c +++ b/libavcodec/riscv/vvc/vvcdsp_init.c @@ -59,6 +59,9 @@ DMVR_PROTOTYPES(8, rvv_256) c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_##bd##_##opt; \ } while (0) +int ff_vvc_sad_rvv_128(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); +int ff_vvc_sad_rvv_256(const int16_t *src0, const int16_t *src1, int dx, int dy, int block_w, int block_h); + #define PUT_PIXELS_PROTOTYPES2(bd, opt) \ void bf(ff_vvc_put_pixels, bd, opt)(int16_t *dst, \ const uint8_t *_src, const ptrdiff_t _src_stride, \ @@ -101,6 +104,8 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) FUNCS(LUMA, rvv_256); FUNCS(CHROMA, rvv_256); break; + case 10: + c->inter.sad = ff_vvc_sad_rvv_256; default: break; } @@ -115,6 +120,8 @@ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) FUNCS(LUMA, rvv_128); FUNCS(CHROMA, rvv_128); break; + case 10: + c->inter.sad = ff_vvc_sad_rvv_128; default: break; }