From patchwork Sat May 4 14:48:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48487 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp424410pzb; Sat, 4 May 2024 07:49:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUr7iiMpmokJv31JpvupbKXPpBt0BJE1WP57rEo5+pEtkihie/qEvqVmbptAMlTihQSOCl8jOmMHiavxdrVyisSFESPPvWu9cTs/Q== X-Google-Smtp-Source: AGHT+IFULYGSobD2L7x67Nr9scWi03Y7N1s2+qB7BxobGOP2+CPecfwQb7MGX8hQzP1UJpllJTgA X-Received: by 2002:a17:906:f2d4:b0:a59:75ab:50ed with SMTP id gz20-20020a170906f2d400b00a5975ab50edmr3540320ejb.5.1714834179668; Sat, 04 May 2024 07:49:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714834179; cv=none; d=google.com; s=arc-20160816; b=x/KsBtJXO6xb5vz1SQ2qBAOB/LcjbOQV6zhkoVRIpDpDCIhe4iLs4PE33GHAeE+5yf hKfZS+OnmvGtwCV+s3w1ZmeqAMmB1vk8u+BtTfcFmRevc/ErmgAw5fNZT/P2szvKs7Ew IIovDKGmeR6rrp580WEYT8C0xMtKXfB8cRopU/RtqNV5wV8ZzPLWV2Y82N+RwWXl0IWK 76UDqeHgLWwMrAR9rirUy4ddj8KuWlggF7OsrV4LW4FIEftb8ITgGYmG3P+lfjNlMmT1 GOAfGhOOkxoFlWKj0Mm6IS37FlhrJIKMT6nn01t6+O0wbJTApwJdURd1rPtWukTjRWAU eLoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=rggJf8KjF9C9C/mttVbooevR49+a3UDC9uCCHSHaGVI=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=lLipuwYPLbBQTOIh03kcs6ru93oYUJdyWNNax3aA7i4iQeaCtawttQWgXqUBotxBok gN0ocKg7vPL1/Ben4alp1pUwtRyFDDQRKqlJNLh6yMnRWgXjRmZZf4wAHgWfIFFRr46J PM7o5mgVLppPbKDZkkG2bMOdayaOZqy/9xcQvwpM+16LHRHGwgugitDT6RVJOUjBTs8w YoUR/Izmy1zZf/qGc4gks+97EuHlY0ugLQrGq/eHkrQB2bzqSBdv3/xeo0HcjEP56Q07 ZoM8SnvfmX88RKeYlao30gQXtnEte19RdN2Lfsp+RT5K9sARTu1ONMEgexw3B4EnIRvW NqLA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=XVU2eLyb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sh39-20020a1709076ea700b00a59adc02afasi536034ejc.482.2024.05.04.07.49.39; Sat, 04 May 2024 07:49:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=XVU2eLyb; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5C39F68D6C8; Sat, 4 May 2024 17:49:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-202.mail.qq.com (out203-205-221-202.mail.qq.com [203.205.221.202]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 358C368D49A for ; Sat, 4 May 2024 17:49:12 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714834142; bh=D4qxMtp69IXuc1rNfb80aRob8WkEuYJbDmxQl57+Rvw=; h=From:To:Cc:Subject:Date; b=XVU2eLybaDh/dvLCxWjgvecvWxXBRAK98pJyBeV9a6P1qEaklrjwHlB4YWs3MZ5mx Ew7qPVMiATkN4v1Cur9ocWiP5LJqsPLDYGVpcyHE0JFwY+hz3U9mED7U8ibSmEAQw3 UABEmOTilhAAWgMs1YlTOy6gEbtdIrjNDHkbohPA= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id C411502C; Sat, 04 May 2024 22:49:01 +0800 X-QQ-mid: xmsmtpt1714834141tw6fzm8lb Message-ID: X-QQ-XMAILINFO: MR/iVh5QLeie1KwtM98ZnjPTXcRqoc52Ra7Sk/y8EyeWbf0w0Nnw6rdR3enCud 6dHWy9bsryYKJfnqyn/ygJF1dvYcpm6n8MlZiD/ormi2Clhr7kKhcI31HwTzMo6RAtBj0Rzpih1/ scpvZVL0+TpiXD3VRRQ2ZzJ60WmbwOJ3GffHEHNfm6DxqtVmaxhQWI5ufOGgljqd8MOb1FZ+Q04L kEpKbktXo370Z0Vioimfz9TEc1kOWkLLmyV5X+YsFpg0SYho/ypW9bbqpPYKcnVXQOkBr2lKCwUY uznbnqwHlFcxi1pjWWIkLZHeGMjgTFP1lwNc2MqCXx5rjU2CjHLDPu/i+Vu0a7QwHOn/PzfE6PIu R4aneJTIRfH3LlermLgXGacV6ExvFKgfw7+m8sgRcugFbHspsCLU6SGnbzxAwQpxpQZNOLIEaHGR ER1qRAFJcSPF1d2rZqPiuO8AbqOEaD3hpZlGEu2CN4uSnnGGoqGZlXoxkwQ+ANqXToGCa/fKi632 NfLk1jDh657WCiiEuNHcaAUcN0/awx6PwjZML8S8u/jdgTMvtmWDCBJGMVvI+TKdjHd75626fmTg UTH9T8qyvRyrgki1xnQUuiBNcOdyu7Y8r5wmrb0MCDpoCeuTOLMFBsfaXIDD1yxT0dncAkRFMJ3q BcgbjpgHhpUNSvkb4tLy1IOua6CYholqEfPlnQ+luNAx/OIDsbsZPLEoqZ9bDH34kcGKJA1CqTHb RreEik8yFphO2e07P/PKCVjTL7eIlANqxFNzemYniX7F2diBU/sEoPGcCpCoUbFL3006ddv81ItP ACrjWaQiw+zLg+SfSDw4MOQ3/onn+45jlnpKGJ4MWqf2lE77FfDbq8215G1LLmg1J8eeVn9KI+G1 G0Urpav42t/m6dAWK6PMystGDLFbaEAvHSpiLo72b6wj9/2WNFeWpbG3HoC8uhqATgc5SSUDugWj JPglWDaJs2brtfnDkO/O2hwO9PUCHO X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 22:48:30 +0800 X-OQ-MSGID: <20240504144840.2411603-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 01/10] lavc/vp8dsp: R-V V put_vp8_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: lixPjRxNJX7s From: sunyuechi C908: vp8_put_pixels4_c: 87.5 vp8_put_pixels4_rvv_i32: 42.7 vp8_put_pixels8_c: 284.5 vp8_put_pixels8_rvv_i32: 77.7 vp8_put_pixels16_c: 1087.7 vp8_put_pixels16_rvv_i32: 108.0 --- libavcodec/riscv/vp8dsp.h | 75 ++++++++++++++++++++++++++++++++++ libavcodec/riscv/vp8dsp_init.c | 22 ++++++++++ libavcodec/riscv/vp8dsp_rvv.S | 27 ++++++++++++ libavcodec/vp8dsp.c | 2 + libavcodec/vp8dsp.h | 1 + 5 files changed, 127 insertions(+) create mode 100644 libavcodec/riscv/vp8dsp.h diff --git a/libavcodec/riscv/vp8dsp.h b/libavcodec/riscv/vp8dsp.h new file mode 100644 index 0000000000..971c5c0a96 --- /dev/null +++ b/libavcodec/riscv/vp8dsp.h @@ -0,0 +1,75 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_RISCV_VP8DSP_H +#define AVCODEC_RISCV_VP8DSP_H + +#include "libavcodec/vp8dsp.h" + +#define VP8_LF_Y(hv, inner, opt) \ + void ff_vp8_##hv##_loop_filter16##inner##_##opt(uint8_t *dst, \ + ptrdiff_t stride, \ + int flim_E, int flim_I, \ + int hev_thresh) + +#define VP8_LF_UV(hv, inner, opt) \ + void ff_vp8_##hv##_loop_filter8uv##inner##_##opt(uint8_t *dstU, \ + uint8_t *dstV, \ + ptrdiff_t stride, \ + int flim_E, int flim_I, \ + int hev_thresh) + +#define VP8_LF_SIMPLE(hv, opt) \ + void ff_vp8_##hv##_loop_filter16_simple_##opt(uint8_t *dst, \ + ptrdiff_t stride, \ + int flim) + +#define VP8_LF_HV(inner, opt) \ + VP8_LF_Y(h, inner, opt); \ + VP8_LF_Y(v, inner, opt); \ + VP8_LF_UV(h, inner, opt); \ + VP8_LF_UV(v, inner, opt) + +#define VP8_LF(opt) \ + VP8_LF_HV(, opt); \ + VP8_LF_HV(_inner, opt); \ + VP8_LF_SIMPLE(h, opt); \ + VP8_LF_SIMPLE(v, opt) + +#define VP8_MC(n, opt) \ + void ff_put_vp8_##n##_##opt(uint8_t *dst, ptrdiff_t dststride, \ + const uint8_t *src, ptrdiff_t srcstride,\ + int h, int x, int y) + +#define VP8_EPEL(w, opt) \ + VP8_MC(pixels ## w, opt); \ + VP8_MC(epel ## w ## _h4, opt); \ + VP8_MC(epel ## w ## _h6, opt); \ + VP8_MC(epel ## w ## _v4, opt); \ + VP8_MC(epel ## w ## _h4v4, opt); \ + VP8_MC(epel ## w ## _h6v4, opt); \ + VP8_MC(epel ## w ## _v6, opt); \ + VP8_MC(epel ## w ## _h4v6, opt); \ + VP8_MC(epel ## w ## _h6v6, opt) + +#define VP8_BILIN(w, opt) \ + VP8_MC(bilin ## w ## _h, opt); \ + VP8_MC(bilin ## w ## _v, opt); \ + VP8_MC(bilin ## w ## _hv, opt) + +#endif /* AVCODEC_RISCV_VP8DSP_H */ diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index af57aabb71..c364de3dc9 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -24,11 +24,33 @@ #include "libavutil/cpu.h" #include "libavutil/riscv/cpu.h" #include "libavcodec/vp8dsp.h" +#include "vp8dsp.h" void ff_vp8_idct_dc_add_rvv(uint8_t *dst, int16_t block[16], ptrdiff_t stride); void ff_vp8_idct_dc_add4y_rvv(uint8_t *dst, int16_t block[4][16], ptrdiff_t stride); void ff_vp8_idct_dc_add4uv_rvv(uint8_t *dst, int16_t block[4][16], ptrdiff_t stride); +VP8_EPEL(16, rvv); +VP8_EPEL(8, rvv); +VP8_EPEL(4, rvv); + +av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) +{ +#if HAVE_RVV + int flags = av_get_cpu_flags(); + + if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { + c->put_vp8_epel_pixels_tab[0][0][0] = ff_put_vp8_pixels16_rvv; + c->put_vp8_epel_pixels_tab[1][0][0] = ff_put_vp8_pixels8_rvv; + c->put_vp8_epel_pixels_tab[2][0][0] = ff_put_vp8_pixels4_rvv; + + c->put_vp8_bilinear_pixels_tab[0][0][0] = ff_put_vp8_pixels16_rvv; + c->put_vp8_bilinear_pixels_tab[1][0][0] = ff_put_vp8_pixels8_rvv; + c->put_vp8_bilinear_pixels_tab[2][0][0] = ff_put_vp8_pixels4_rvv; + } +#endif +} + av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RVV diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 8a0773f964..063ab7110c 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -71,3 +71,30 @@ func ff_vp8_idct_dc_add4uv_rvv, zve32x ret endfunc + +.macro put_vp8_pixels +1: + addi a4, a4, -1 + vle8.v v0, (a2) + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +.endm + +func ff_put_vp8_pixels16_rvv, zve32x + vsetivli zero, 16, e8, m1, ta, ma + put_vp8_pixels +endfunc + +func ff_put_vp8_pixels8_rvv, zve32x + vsetivli zero, 8, e8, mf2, ta, ma + put_vp8_pixels +endfunc + +func ff_put_vp8_pixels4_rvv, zve32x + vsetivli zero, 4, e8, mf4, ta, ma + put_vp8_pixels +endfunc diff --git a/libavcodec/vp8dsp.c b/libavcodec/vp8dsp.c index df7bd12424..f7c9c9899c 100644 --- a/libavcodec/vp8dsp.c +++ b/libavcodec/vp8dsp.c @@ -1402,6 +1402,8 @@ dsp->put_vp8_epel_pixels_tab[2][2][2] = put_vp8_epel4_h6v6_c; ff_vp78dsp_init_arm(dsp); #elif ARCH_PPC ff_vp78dsp_init_ppc(dsp); +#elif ARCH_RISCV + ff_vp78dsp_init_riscv(dsp); #elif ARCH_X86 ff_vp78dsp_init_x86(dsp); #endif diff --git a/libavcodec/vp8dsp.h b/libavcodec/vp8dsp.h index 30dc2c6cc1..3bf12b6b45 100644 --- a/libavcodec/vp8dsp.h +++ b/libavcodec/vp8dsp.h @@ -87,6 +87,7 @@ void ff_vp78dsp_init(VP8DSPContext *c); void ff_vp78dsp_init_aarch64(VP8DSPContext *c); void ff_vp78dsp_init_arm(VP8DSPContext *c); void ff_vp78dsp_init_ppc(VP8DSPContext *c); +void ff_vp78dsp_init_riscv(VP8DSPContext *c); void ff_vp78dsp_init_x86(VP8DSPContext *c); void ff_vp8dsp_init(VP8DSPContext *c);