From patchwork Fri Nov 10 21:38:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44612 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4f99:b0:181:818d:5e7f with SMTP id gh25csp547267pzb; Fri, 10 Nov 2023 13:38:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IFD/2pmCx9qgfkJ2GMfsvm+lUh6aBQ77I5xSzfifIdJUIQ6Rl9nszjjdSSqSY2QCtJTEfiq X-Received: by 2002:a17:907:770d:b0:9ae:5120:5147 with SMTP id kw13-20020a170907770d00b009ae51205147mr108771ejc.38.1699652333416; Fri, 10 Nov 2023 13:38:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699652333; cv=none; d=google.com; s=arc-20160816; b=N1ggSGSkvQ5pnyROyLD5EBp9ytNr8olo12FarS767m2vYXj0UOBr187A+SFlfI4FTV JEk9U6qO6w0WWTlkXuOjscPFzU+sUFH1+4dvH5NyD00OGolwN7aBus5VsnCHYR8MSSTP 2nZb8gJZR7am/aIhSrMPr68tMV5TAxNOUFH1VHoN7Vi+rhu1QaER0jS4wOwm6TQnn8Uk sUChjNNEBhKkh59odOP8ht/1teLoWIAuFZFOcvbXdPWMEcC/fsVYxH3J0F/JafHHjSxK VJkYTIZXSKznHkexFfku8NoFE9TLd55NukWuW0Z6isKf7iwpLFsKgqOwLGMrcm30dSHg D0gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=URDMAi2APjyzaWAg3L1kVS0T876F2eKgXLhNEpKFMYA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=qHjQNfethSGJ/0C3zPuYDLbaZIYjqbjNSYdMMMqDToe/zx2ccUWwWBpsk0BZLvTDPZ xJqNQ/lY9z9IQaf2G7UZad+5RLxl54leg8ZCahC99R5pgroWbtWVHI5tPbMjq80S9E8n sAK1zGaT3yVoksevOEgk6CNlpOtKHVgg8rqd5piJmp6d/8yMIAG3TJmHGjuMfe3RbLRh DqGPpgTW2e5+rXMAN77AgRBW2CBKgQv9j8MGIjCX7O0SFEPx5EjuBjdQInkyI2E/8QxP LSYKn9L1MtGENV7Kz8zNe2JbdGZEwpD3D24shr38l62yyjURbmpzKNVBsVF3eQv2GYWk j7gA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ho31-20020a1709070e9f00b009bfb6e5c30csi116534ejc.562.2023.11.10.13.38.52; Fri, 10 Nov 2023 13:38:53 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 63A6968CC47; Fri, 10 Nov 2023 23:38:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1A51168CB8E for ; Fri, 10 Nov 2023 23:38:43 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 867BCC0014 for ; Fri, 10 Nov 2023 23:38:41 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Fri, 10 Nov 2023 23:38:41 +0200 Message-ID: <20231110213841.24855-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/sbrdsp: R-V V hf_apply_noise functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8CEOXh+hAbQa This is restricted to 128-bit vectors as larger vector sizes could read past the end of the noise array. Support for future hardware with larger vector sizes is left for some other time. hf_apply_noise_0_c: 2319.7 hf_apply_noise_0_rvv_f32: 1229.0 hf_apply_noise_1_c: 2539.0 hf_apply_noise_1_rvv_f32: 1244.7 hf_apply_noise_2_c: 2319.7 hf_apply_noise_2_rvv_f32: 1232.7 hf_apply_noise_3_c: 2541.2 hf_apply_noise_3_rvv_f32: 1244.2 --- libavcodec/riscv/sbrdsp_init.c | 17 +++++++++ libavcodec/riscv/sbrdsp_rvv.S | 67 ++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+) diff --git a/libavcodec/riscv/sbrdsp_init.c b/libavcodec/riscv/sbrdsp_init.c index e5736452ec..2ed46153ea 100644 --- a/libavcodec/riscv/sbrdsp_init.c +++ b/libavcodec/riscv/sbrdsp_init.c @@ -21,6 +21,7 @@ #include "config.h" #include "libavutil/attributes.h" #include "libavutil/cpu.h" +#include "libavutil/riscv/cpu.h" #include "libavcodec/sbrdsp.h" void ff_sbr_sum64x5_rvv(float *z); @@ -32,6 +33,14 @@ void ff_sbr_hf_gen_rvv(float (*X_high)[2], const float (*X_low)[2], float bw, int start, int end); void ff_sbr_hf_g_filt_rvv(float (*Y)[2], const float (*X_high)[40][2], const float *g_filt, int m_max, intptr_t ixh); +void ff_sbr_hf_apply_noise_0_rvv(float (*Y)[2], const float *s, + const float *f, int n, int kx, int max); +void ff_sbr_hf_apply_noise_1_rvv(float (*Y)[2], const float *s, + const float *f, int n, int kx, int max); +void ff_sbr_hf_apply_noise_2_rvv(float (*Y)[2], const float *s, + const float *f, int n, int kx, int max); +void ff_sbr_hf_apply_noise_3_rvv(float (*Y)[2], const float *s, + const float *f, int n, int kx, int max); av_cold void ff_sbrdsp_init_riscv(SBRDSPContext *c) { @@ -44,6 +53,14 @@ av_cold void ff_sbrdsp_init_riscv(SBRDSPContext *c) c->sum_square = ff_sbr_sum_square_rvv; c->hf_gen = ff_sbr_hf_gen_rvv; c->hf_g_filt = ff_sbr_hf_g_filt_rvv; + if (ff_get_rv_vlenb() <= 16) { + c->hf_apply_noise[0] = ff_sbr_hf_apply_noise_0_rvv; + c->hf_apply_noise[2] = ff_sbr_hf_apply_noise_2_rvv; + if (flags & AV_CPU_FLAG_RVB_BASIC) { + c->hf_apply_noise[1] = ff_sbr_hf_apply_noise_1_rvv; + c->hf_apply_noise[3] = ff_sbr_hf_apply_noise_3_rvv; + } + } } c->autocorrelate = ff_sbr_autocorrelate_rvv; } diff --git a/libavcodec/riscv/sbrdsp_rvv.S b/libavcodec/riscv/sbrdsp_rvv.S index 43fab1f65f..02feb6451e 100644 --- a/libavcodec/riscv/sbrdsp_rvv.S +++ b/libavcodec/riscv/sbrdsp_rvv.S @@ -243,3 +243,70 @@ func ff_sbr_hf_g_filt_rvv, zve32f ret endfunc + +.macro hf_apply_noise n + lla a6, ff_sbr_noise_table + fmv.s.x ft0, zero + addi a6, a6, 8 +1: +.if \n & 1 + min t0, t0, a5 // preserve parity of t0 for v4 sign injector + vsetvli zero, t0, e32, m4, ta, mu +.else + vsetvli t0, a5, e32, m4, ta, mu +.endif + sh3add t6, a3, a6 + vle32.v v8, (a1) // s_m + sub a5, a5, t0 + vle32.v v12, (a2) // q_filt + sh2add a1, t0, a1 + vmfeq.vf v0, v8, ft0 // s_m == 0.f + vlseg2e32.v v24, (t6) // ff_sbr_noise_table + sh2add a2, t0, a2 +.if \n == 2 + vfneg.v v8, v8 +.endif +.if \n & 1 + vfsgnjx.vv v8, v8, v4 // could equivalent use vxor.vv +.endif + add a3, t0, a3 + vlseg2e32.v v16, (a0) // Y + andi a3, a3, 0x1ff +.if \n & 1 + vfmul.vv v28, v12, v28 + vfmacc.vv v16, v12, v24, v0.t + vmerge.vvm v28, v8, v28, v0 + vfadd.vv v20, v20, v28 +.else + vfmul.vv v24, v12, v24 + vfmacc.vv v20, v12, v28, v0.t + vmerge.vvm v24, v8, v24, v0 + vfadd.vv v16, v16, v24 +.endif + vsseg2e32.v v16, (a0) + sh3add a0, t0, a0 + bnez a5, 1b + + ret +.endm + +func ff_sbr_hf_apply_noise_0_rvv, zve32f + hf_apply_noise 0 +endfunc + +func ff_sbr_hf_apply_noise_3_rvv, zve32f + not a4, a4 // invert parity of kx + // fall through +endfunc + +func ff_sbr_hf_apply_noise_1_rvv, zve32f + vsetvli t0, zero, e32, m4, ta, ma + vid.v v4 + vxor.vx v4, v4, a4 + vsll.vi v4, v4, 31 // v4[i] = (kx & 1) ? -0.f : +0.f + hf_apply_noise 1 +endfunc + +func ff_sbr_hf_apply_noise_2_rvv, zve32f + hf_apply_noise 2 +endfunc