From patchwork Mon May 13 16:43:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 48856 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a48:b0:1af:fc2d:ff5a with SMTP id zu8csp456983pzb; Mon, 13 May 2024 09:44:02 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXFDnzRhtjf6FrBsLukU8AcxYprnU3dS/RiV19rDpM8dmBQaSufutNU6sF1vSLR6OxAU4G0HWzVasrXX11D3632vDsMK+zjf/bV9w== X-Google-Smtp-Source: AGHT+IGosTf+iBoJHnE9dCydRgO8ib2N5xrmtBLnWmvyqwiA2BRbFu7/pXi6vgtNdbHXxQ5B8Ts4 X-Received: by 2002:a05:6512:3156:b0:520:107f:8375 with SMTP id 2adb3069b0e04-5220ff70bbamr5575515e87.50.1715618641312; Mon, 13 May 2024 09:44:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1715618641; cv=none; d=google.com; s=arc-20160816; b=CmAL+UEGkLTKFBqNwhHaz8BK7kackjNTQO0Pgo4JuhzO1mXlaG2ZsOG4Sn6s36q83Z fvOXrM/fMDPweeQ7o42bIhJDiAELWxtUgRBXlwhdOnnn7ca9p7N3ky0YS1208Q09qZL+ pFMab5TJRF5HSN+ZBYjIbKUL7si1T1+X+whxqrHt4IIruQOKWVmGXfWVCSfbA+M984Vn k3Wbc+Cphc5FZBtwykN3y1TamUikacorf0LcT/+ASqYEu+doZJCkAyLtjPdO7/Zug52r X4C1FUxSyV7GxnEdzhVkLmgW/Rb9e/hfb6ZICJ5UFCQx7iWuqWG0i0w27y5V0PpZoK8L Tksg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=cFD4mHY/w7PFiXm50EPMFflSBdFnourlvuQb8hvo3cE=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=u+n0+KRCig9UOYqEujcTft+t/pkewqhKDUwpiL5GA3Ua6boKQCnPeLK25Nime5yiMP Stm/Xta0DM7kyVGI2cA0zvsnl5i1QIL7R/MhJteACnrnoq5qhvhPUDNNxR82cNgXvFx7 OKEej3sOocgSsTc/XE1o6GSw5g0SP/Kj9DYaObsm9uveasMWsz6kDw7xD5SUF+vpbL1p qir5nHjoWRtGfJ5/AAKjpIbQzEou0vxhFTndAgzPnxDWp7O+VvqeRoVRZRS/Ufi6yoVM +dTwYkZUVhmPhU9vyENsDvnf3V9SBAWylBRewiMWjl9jPuHK9SIsigcw8LEybyLBocn/ fsew==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-521f39d3cb9si3178369e87.547.2024.05.13.09.44.00; Mon, 13 May 2024 09:44:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A389A68D544; Mon, 13 May 2024 19:43:35 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D028B68D303 for ; Mon, 13 May 2024 19:43:28 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 62067C0069 for ; Mon, 13 May 2024 19:43:28 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 13 May 2024 19:43:28 +0300 Message-ID: <20240513164328.21569-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] Revert "lavc/sbrdsp: R-V V neg_odd_64" X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: H7c7c7/BosZl While this function can easily be written with vectors, it just fails to get any performance improvement. For reference, this is a simpler loop-free implementation that does get better performance than the current one depending on hardware, but still more or less the same metrics as the C code: func ff_sbr_neg_odd_64_rvv, zve64x li a1, 32 addi a0, a0, 7 li t0, 8 vsetvli zero, a1, e8, m2, ta, ma li t1, 0x80 vlse8.v v8, (a0), t0 vxor.vx v8, v8, t1 vsse8.v v8, (a0), t0 ret endfunc This reverts commit d06fd18f8f4c6a81ef94cbb600620d83ad51269d. --- libavcodec/riscv/sbrdsp_init.c | 5 ----- libavcodec/riscv/sbrdsp_rvv.S | 17 ----------------- 2 files changed, 22 deletions(-) diff --git a/libavcodec/riscv/sbrdsp_init.c b/libavcodec/riscv/sbrdsp_init.c index f937c47e22..d3bafa961e 100644 --- a/libavcodec/riscv/sbrdsp_init.c +++ b/libavcodec/riscv/sbrdsp_init.c @@ -26,7 +26,6 @@ void ff_sbr_sum64x5_rvv(float *z); float ff_sbr_sum_square_rvv(float (*x)[2], int n); -void ff_sbr_neg_odd_64_rvv(float *x); void ff_sbr_autocorrelate_rvv(const float x[40][2], float phi[3][2][2]); void ff_sbr_hf_gen_rvv(float (*X_high)[2], const float (*X_low)[2], const float alpha0[2], const float alpha1[2], @@ -64,9 +63,5 @@ av_cold void ff_sbrdsp_init_riscv(SBRDSPContext *c) } c->autocorrelate = ff_sbr_autocorrelate_rvv; } -#if __riscv_xlen >= 64 - if ((flags & AV_CPU_FLAG_RVV_I64) && (flags & AV_CPU_FLAG_RVB_ADDR)) - c->neg_odd_64 = ff_sbr_neg_odd_64_rvv; -#endif #endif } diff --git a/libavcodec/riscv/sbrdsp_rvv.S b/libavcodec/riscv/sbrdsp_rvv.S index 918c37882f..aba9a28108 100644 --- a/libavcodec/riscv/sbrdsp_rvv.S +++ b/libavcodec/riscv/sbrdsp_rvv.S @@ -68,23 +68,6 @@ NOHWF fmv.x.w a0, fa0 ret endfunc -#if __riscv_xlen >= 64 -func ff_sbr_neg_odd_64_rvv, zve64x - li a1, 32 - li t1, 1 << 63 -1: - vsetvli t0, a1, e64, m8, ta, ma - vle64.v v8, (a0) - sub a1, a1, t0 - vxor.vx v8, v8, t1 - vse64.v v8, (a0) - sh3add a0, t0, a0 - bnez t0, 1b - - ret -endfunc -#endif - func ff_sbr_autocorrelate_rvv, zve32f vsetvli t0, zero, e32, m4, ta, ma vmv.v.x v0, zero