From patchwork Mon Oct 28 17:08:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 52534 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:be6b:0:b0:48e:c0f8:d0de with SMTP id bd11csp1749626vqb; Mon, 28 Oct 2024 10:24:52 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXuOxbl+8JmNgd7FUOFCGm7Ya9XBe0riTCQ+mQnmIHoSZU9zQfykD3OgG+5HpJqzxnKCcnRzo/xz/0Gw540d5pD@gmail.com X-Google-Smtp-Source: AGHT+IHzCddASgtqKyuCnysIdx5fJjjNR0xp2yoiKaQb+U3hNJhVi9Ntm+leGs0al9eySLGQ1wRE X-Received: by 2002:a17:907:3d8b:b0:a99:f3a5:a345 with SMTP id a640c23a62f3a-a9de5a30ba1mr839149566b.0.1730136292186; Mon, 28 Oct 2024 10:24:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1730136292; cv=none; d=google.com; s=arc-20240605; b=FgTyf4zNvTWMKHNQH0gv0/PJn3dSDWnI0ZQWcnl7THEiP8zp0d0M5a+dwCS2YchA5i m2NZCNlhY/91MlgJUKzU5Ct4S+XH1MVQ1rDLrZBYyHYVRGfqVMMeZeGZpPwaZjCJv+ds cgn59G1ifixj2fg1tX0/M3Z2nRe4UvMrYWItKDZ4yRJ8n1BjMBcM42jOo5a5GlSIcgdI 7R0nANEM/r4FBhe6gNlG+Nk7iAvExvw5tyUVqJMDhE/0w3Zb75phWy6ZxMinLN6scstT nkaxENixZLDCewja8UBb5BwQzFPBFpb3JtbhPbe0RBoN3AjwR/qPGU24aT6LZmlxEHR3 uE7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=J/MIfUnWY2WTTrMSOAaJl96WTU9V6XEvSHFh+fedKzo=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=EX5haSkdYaSpXXdVhvfY4478rom/sq+yxAp33JjHgbrV4EOQZYL2jHGlvsLNGlMAJZ mqg2/3p7Fz5joYow+PRbLVeSwYoAWrfqO5uFB1qEQ7N67Tv3FS4Z+om2Pcjsd+z9H7gG P9FNzZV2KdNoqudrzTRu8jjfOi0py39BRMDzYQm48MjJBf7DBOdD2F0NPq7DEFwH2acP MIlVUZ0R14941yXmqSK3XcRTpgVGTTVgxTclH8hHkNTr/vktTXltGf/FvMMrcfltR/K3 0wfY8BjeV7sz6lITjYwGOS7jY5moJbVr01RoEnMhe5izANjjBfueEbJGn+cXWww+EOvY BfyA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=BhuA14gS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a9b3a0bdb40si445508966b.1041.2024.10.28.10.24.51; Mon, 28 Oct 2024 10:24:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=BhuA14gS; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C177F68DBCE; Mon, 28 Oct 2024 19:08:43 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-236.mail.qq.com (out203-205-221-236.mail.qq.com [203.205.221.236]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 62AA968DBB9 for ; Mon, 28 Oct 2024 19:08:36 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1730135306; bh=CagAixoHxPNP2CSjDHugO88K5Xq7Usqg5Tvrza8CYEo=; h=From:To:Cc:Subject:Date; b=BhuA14gSV2twfRjxWYbBTnCJcnPAB0yzxAJfRHEBLJh4oUjKUE/AVg0TwcUez9tfy lLoFXg80EimlFzULoWDk+GgXFn4gKw0nRYgQXea0wnhxpVT/9QEvzSxqFK1PcUUd4p u5cE8atqF4Zl8yZG8/eLj2FLFImA/ULZksZGgWR8= Received: from localhost.localdomain ([42.56.221.142]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id 21931EB9; Tue, 29 Oct 2024 01:08:25 +0800 X-QQ-mid: xmsmtpt1730135305tbzs7ku3r Message-ID: X-QQ-XMAILINFO: NY/MPejODIJVdPOySG1Pt3jzqg0c7CCSY43VflLCNwBo2faHXQ5QAjdMfUxNUP LNNlOSi69cI3rI6ojb6xUXt/VTKJvu10V+o3XQvZKDBOZvn0oycWf2c94ObVhW6U3Msi9+qLZ4FA ZCQLIErDr3roLzsxI6uoUhiOo/tz+6Qv/a9BxBQDA7zx1yWjUTR1m3rUiKl5eAZ68i8nFTsoIEpU WMfKl3b7KhxkcKkFUNMv8qEJompyZPruf6ZlESd/m3t/iJNBuFZBjpRqM3RnIfiYzliE/H9g1dQm 8+XABNDcNncfpUfxbKyezth0xm4U4rchCmokmH5derf26Qyt7C7bs6Jir/EbclAjHXWRQdgg0FCE JepMEZZm9Y+An4YGcweM06/rbr+95FRnkYHa//anpgK/VWcLhCJIEZZbZ3FsdER8taLP02m0j/uD KLK75NSLsJkKh4ufGGFQRqTa7Q0cQnXZ7TuGGzEJ1SiYGItYNnpM/4A8Jzlq9bt72QJVtr1zaIo4 6/JDnD0lAbCknNTkLpO8bHjcwlRHM9tGqg/nKCa+jO3u6673HU0dbO9uLaoo3NzxPe4crAo/4oYL /jP7vQxBwXtpF5uu53Zcilynjtf4QdH4VfrqOk0CDBbzH7qi7sotHojXmJhPDEwqs1UWtXHqlFLc +YFElOqoXsQFbsg3bwA7LXkXYcIZSgdzq36GpwHsaH6d7r9vIPoywD4MOL18hymJZmJ3F0icQIcC tReVBjAZmJK8+tpa9ElKc4bnRQt1oQ/wGG5lGgm0N4W/wHtpkMJUpvDr3E48jFDoXCULP6Xm6Bxk jbsIABS++oj22lLfiyazOOE6vXsqM9U4sdrL3iUL2N4fjPx+PLVKucjkBavijXHRY6x2HAaiUeLK amJ0AG9FdC3OQMUKEL/fG/GGs99v3R36wm/RbvZl6JHSsF11+bdsGyxR5lhjRBLakRKNzOYwZd0f 549yTrONn4wJvCRXUYCw== X-QQ-XMRINFO: MPJ6Tf5t3I/ycC2BItcBVIA= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Tue, 29 Oct 2024 01:08:24 +0800 X-OQ-MSGID: <20241028170824.223147-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.47.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/5] lavc/vvc_mc: R-V V put_uni_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9XxXlCqj7pOt From: sunyuechi k230 banana_f3 put_uni_pixels_chroma_8_4x4_c: 128.3 ( 1.00x) 90.5 ( 1.00x) put_uni_pixels_chroma_8_4x4_rvv_i32: 17.6 ( 7.30x) 17.4 ( 5.18x) put_uni_pixels_chroma_8_8x8_c: 295.1 ( 1.00x) 163.2 ( 1.00x) put_uni_pixels_chroma_8_8x8_rvv_i32: 35.8 ( 8.24x) 27.9 ( 5.84x) put_uni_pixels_chroma_8_16x16_c: 619.3 ( 1.00x) 267.4 ( 1.00x) put_uni_pixels_chroma_8_16x16_rvv_i32: 72.8 ( 8.50x) 48.7 ( 5.49x) put_uni_pixels_chroma_8_32x32_c: 1433.8 ( 1.00x) 538.2 ( 1.00x) put_uni_pixels_chroma_8_32x32_rvv_i32: 230.3 ( 6.23x) 236.2 ( 2.28x) put_uni_pixels_chroma_8_64x64_c: 3517.3 ( 1.00x) 1455.0 ( 1.00x) put_uni_pixels_chroma_8_64x64_rvv_i32: 813.6 ( 4.32x) 590.2 ( 2.47x) put_uni_pixels_chroma_8_128x128_c: 10174.6 ( 1.00x) 5798.7 ( 1.00x) put_uni_pixels_chroma_8_128x128_rvv_i32: 2989.3 ( 3.40x) 2371.4 ( 2.45x) put_uni_pixels_luma_8_4x4_c: 128.6 ( 1.00x) 90.5 ( 1.00x) put_uni_pixels_luma_8_4x4_rvv_i32: 17.3 ( 7.42x) 17.4 ( 5.18x) put_uni_pixels_luma_8_8x8_c: 295.1 ( 1.00x) 142.4 ( 1.00x) put_uni_pixels_luma_8_8x8_rvv_i32: 26.6 (11.10x) 27.9 ( 5.10x) put_uni_pixels_luma_8_16x16_c: 600.6 ( 1.00x) 277.7 ( 1.00x) put_uni_pixels_luma_8_16x16_rvv_i32: 82.1 ( 7.32x) 48.7 ( 5.70x) put_uni_pixels_luma_8_32x32_c: 1406.1 ( 1.00x) 528.0 ( 1.00x) put_uni_pixels_luma_8_32x32_rvv_i32: 230.3 ( 6.10x) 131.9 ( 4.00x) put_uni_pixels_luma_8_64x64_c: 4600.6 ( 1.00x) 1309.2 ( 1.00x) put_uni_pixels_luma_8_64x64_rvv_i32: 1073.1 ( 4.29x) 382.2 ( 3.43x) put_uni_pixels_luma_8_128x128_c: 11350.3 ( 1.00x) 3506.9 ( 1.00x) put_uni_pixels_luma_8_128x128_rvv_i32: 3119.1 ( 3.64x) 2017.5 ( 1.74x) --- libavcodec/riscv/h26x/h2656_inter_rvv.S | 53 +++++++++++++++++++++++++ libavcodec/riscv/h26x/h2656dsp.h | 33 +++++++++++++++ libavcodec/riscv/vvc/Makefile | 3 +- libavcodec/riscv/vvc/vvcdsp_init.c | 5 +++ 4 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/h26x/h2656_inter_rvv.S create mode 100644 libavcodec/riscv/h26x/h2656dsp.h diff --git a/libavcodec/riscv/h26x/h2656_inter_rvv.S b/libavcodec/riscv/h26x/h2656_inter_rvv.S new file mode 100644 index 0000000000..6692e33acf --- /dev/null +++ b/libavcodec/riscv/h26x/h2656_inter_rvv.S @@ -0,0 +1,53 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavcodec/riscv/h26x/asm.S" + +.macro put_uni_pixels w, vlen, id +\id\w\vlen: +.if \w == 128 && \vlen == 128 + li t0, \w + vsetvli zero, t0, e8, m8, ta, ma +.else + vsetvlstatic8 \w, \vlen +.endif +1: + vle8.v v0, (a2) + addi a4, a4, -1 + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + ret +.endm + +.macro func_put_uni_pixels vlen +func ff_h2656_put_uni_pixels_8_rvv_\vlen\(), zve32x, zbb, zba + lpad 0 + POW2_JMP_TABLE 4, \vlen + POW2_J \vlen, 4, a7 + .irp w,2,4,8,16,32,64,128 + put_uni_pixels \w, \vlen, 4 + .endr +endfunc +.endm + +func_put_uni_pixels 256 +func_put_uni_pixels 128 diff --git a/libavcodec/riscv/h26x/h2656dsp.h b/libavcodec/riscv/h26x/h2656dsp.h new file mode 100644 index 0000000000..41ba6bc331 --- /dev/null +++ b/libavcodec/riscv/h26x/h2656dsp.h @@ -0,0 +1,33 @@ +/* + * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_RISCV_H26X_H2656DSP_H +#define AVCODEC_RISCV_H26X_H2656DSP_H + +#define H2656_PEL_PROTOTYPE(name, D, opt) \ +void ff_h2656_put_uni_ ## name ## _ ## D ## _##opt(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, int width) \ + +#define H2656_MC_8TAP_PROTOTYPES(fname, bitd, opt) \ + H2656_PEL_PROTOTYPE(fname, bitd, opt); \ + +H2656_MC_8TAP_PROTOTYPES(pixels , 8, rvv_256); +H2656_MC_8TAP_PROTOTYPES(pixels , 8, rvv_128); + +#endif diff --git a/libavcodec/riscv/vvc/Makefile b/libavcodec/riscv/vvc/Makefile index 582b051579..ec116aebc1 100644 --- a/libavcodec/riscv/vvc/Makefile +++ b/libavcodec/riscv/vvc/Makefile @@ -1,2 +1,3 @@ OBJS-$(CONFIG_VVC_DECODER) += riscv/vvc/vvcdsp_init.o -RVV-OBJS-$(CONFIG_VVC_DECODER) += riscv/vvc/vvc_mc_rvv.o +RVV-OBJS-$(CONFIG_VVC_DECODER) += riscv/vvc/vvc_mc_rvv.o \ + riscv/h26x/h2656_inter_rvv.o diff --git a/libavcodec/riscv/vvc/vvcdsp_init.c b/libavcodec/riscv/vvc/vvcdsp_init.c index bee892cb7c..9dea70f392 100644 --- a/libavcodec/riscv/vvc/vvcdsp_init.c +++ b/libavcodec/riscv/vvc/vvcdsp_init.c @@ -25,6 +25,7 @@ #include "libavutil/riscv/cpu.h" #include "libavcodec/vvc/dsp.h" #include "libavcodec/vvc/dec.h" +#include "libavcodec/riscv/h26x/h2656dsp.h" #define bf(fn, bd, opt) fn##_##bd##_##opt @@ -72,8 +73,12 @@ PUT_PIXELS_PROTOTYPES2(8, rvv_256) c->inter.dst[C][w][idx1][idx2] = a; \ } while (0) \ +#define DIR_FUNCS(d, C, opt) \ + PEL_FUNC(put_##d, C, 0, 0, ff_h2656_put_##d##_pixels_8_##opt); \ + #define FUNCS(C, opt) \ PEL_FUNC(put, C, 0, 0, ff_vvc_put_pixels_8_##opt); \ + DIR_FUNCS(uni, C, opt); \ void ff_vvc_dsp_init_riscv(VVCDSPContext *const c, const int bd) {