From patchwork Tue Dec 12 21:02:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 45094 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp3878764pzf; Tue, 12 Dec 2023 13:02:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IGJvq0SbhcGSC9S4hvYfTaz1EUchV5XNirZI6uTgkE3AsYlLp+LvW0mXu1rLwOxbAoyvokM X-Received: by 2002:a17:906:10d1:b0:a17:4585:456c with SMTP id v17-20020a17090610d100b00a174585456cmr3588514ejv.0.1702414978543; Tue, 12 Dec 2023 13:02:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702414978; cv=none; d=google.com; s=arc-20160816; b=dcgUU96IpV972MvLJlGCv4E1+JtRcoJNzupMDWguSDmopUEbv04sxae0llyaEutUrX b+6Ri3lipLG6oyqrSRMf7OFIHYtSwg8SN1KaB06M4H6Ez4ibiX4T+V8lhC3BFWopsoO+ ODByrYEHrZp0d8ftvXYHrJIjoXEM87noY8QXF46MfpqOQ11XhUx7o5xtr9sgaZjbWW7O 7CP7EzP7qU668UJgXAfjZTRfQg/c+dwocHjx49h8cnOqi5fJtozfN2raAwMeEuU5ot3K txhuOA8ewYk9LnSl1lOTPGpXJtA1o/x359gkMflxLbT/2EAAiQ+7UmXLf3QELbo3hqLZ Xvfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=5lDd/PbGMg1+kjhAIKGocpOJkINkhtgbMP5kfZO7Xhw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=pSiBRZQn9fxTC+5k31+IhCCve7G5Aj33u0obLLidiMchValGcP2JUv3xNQHGt6obhm gulYVfyJpR+hVSWiIpaE2EfOl0f9ax3JXGhcn16qQNtTVlneQOZXQ1zhjZnDHtqpYxob K3rlteUT/F0ThcrSaaRIJQEY+AreZcn5HXPPJSQIbjY9XbH8Qme/PgWeAyKyFkIOlv3q hIJCN8KiaF/Z9XxLJTaAKGbnF+OJF2/5Wu7CA9G8oG6UZTtwIrG7kCTXcO6Z42FN0LyM nzPn8Mefuh7LRGMejm22/5wBpfyOE5BVI17Ut2dd5QPvffPkYnsKfNjeGy7jGlWbHEdG ZRnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b26-20020a170906039a00b00a18f4244c01si4673512eja.45.2023.12.12.13.02.50; Tue, 12 Dec 2023 13:02:58 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A6B7568D116; Tue, 12 Dec 2023 23:02:47 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4535768CF82 for ; Tue, 12 Dec 2023 23:02:41 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id ADB8CC00A2 for ; Tue, 12 Dec 2023 23:02:40 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 12 Dec 2023 23:02:39 +0200 Message-ID: <20231212210240.19886-1-remi@remlab.net> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] checkasm/lpc: test compute_autocorr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 5kn66YduNuIt --- tests/checkasm/lpc.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c index 592e34c03d..4d84defec3 100644 --- a/tests/checkasm/lpc.c +++ b/tests/checkasm/lpc.c @@ -57,10 +57,41 @@ static void test_window(int len) bench_new(src, len, dst1); } +static void test_compute_autocorr(ptrdiff_t len, int lag) +{ + LOCAL_ALIGNED(16, double, src, [5000]); + LOCAL_ALIGNED(16, double, dst0, [MAX_LPC_ORDER + 1]); + LOCAL_ALIGNED(16, double, dst1, [MAX_LPC_ORDER + 1]); + + declare_func(void, const double *in, ptrdiff_t len, int lag, double *out); + + av_assert0(lag >= 0 && lag <= MAX_LPC_ORDER); + + for (size_t i = 0; i < len; i++) { + src[i] = (double)rnd() / (double)UINT_MAX; + } + + call_ref(src, len, lag, dst0); + call_new(src, len, lag, dst1); + + for (size_t i = 0; i < lag; i++) { + if (!double_near_abs_eps(dst0[i], dst1[i], EPS)) { + fprintf(stderr, "%zu: %- .12f - %- .12f = % .12g\n", + i, dst0[i], dst1[i], dst0[i] - dst1[i]); + fail(); + break; + } + } + + bench_new(src, len, lag, dst1); +} + void checkasm_check_lpc(void) { LPCContext ctx; - int len = rnd() % 5000; + int len = 2000 + (rnd() % 3000); + static const int lags[] = { 10, 30, 32 }; + ff_lpc_init(&ctx, 32, 16, FF_LPC_TYPE_DEFAULT); if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_even")) { @@ -72,6 +103,13 @@ void checkasm_check_lpc(void) test_window(len | 1); } report("apply_welch_window_odd"); - ff_lpc_end(&ctx); + + for (size_t i = 0; i < FF_ARRAY_ELEMS(lags); i++) { + ff_lpc_init(&ctx, 32, lags[i], FF_LPC_TYPE_DEFAULT); + if (check_func(ctx.lpc_compute_autocorr, "autocorr_%d", lags[i])) + test_compute_autocorr(len, lags[i]); + report("compute_autocorr_%d", lags[i]); + ff_lpc_end(&ctx); + } } From patchwork Tue Dec 12 21:02:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 45095 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1225:b0:181:818d:5e7f with SMTP id v37csp3878826pzf; Tue, 12 Dec 2023 13:03:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IHtmslbDztudLMSAmzkOUuMlcp8wra03at68VXrt9HhL5WOOm7FqzK+6UFo/B+WQ87Gw23U X-Received: by 2002:a17:906:d8e:b0:a19:a19b:c73c with SMTP id m14-20020a1709060d8e00b00a19a19bc73cmr3341851eji.140.1702414984558; Tue, 12 Dec 2023 13:03:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702414984; cv=none; d=google.com; s=arc-20160816; b=QZQvRerXMqXM7Rk942V9A1wg45XjyUdBxnHKBBGTp2pu2qreJGEnTaITAn5EwawmTK 0fAhBJGju0bWsgdYkxOlvl4FqJNZ+ARH7RGCq3y2Qobe7JRZiOnAmN4gymHPqxdLqBV0 U9tnxT0uZDDUQL6k6bd6m+wiQZdjC2lOaBaLpnvdXk0lWb2k1ZUvWfxCuz+rjBi1LpfX t6GoHYD8rbzuM/C2PgMdqJHlUt5kjKwonq7D5nIbmH1JdA2yQ4Vv0a6FPsUCw0m155th t8e9zwf+mFs9its5ULgZeWgWLHdY/PKNxRi5LgiHJfd5kFhciJZC/iezfKk90eFmD709 LaOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=mx4tmlUbGuv53oYgs5DfF91j/+XLZuTSc2dDNVAliE4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=NCCDGANH81uOwAlwdK/aHCekE5ynLjTihHPm/GKnFEPPu4Mzvhn4TXTOJFZEIRZtsG ZVCLArYr21mSelyj6VswuY4h6jGhS8MPRZsdjTZKnU6BOathC3wex/HI1YrRU4uklVpm 01/hKOG7ZUwvdoRO7vRkGlVZCJB9HPtmvca++gUSmQCqH5lbxCXe3PzDUq/GgKKBKreb JuXbrK3lp1oU5CgInwpvuwIGjFoVJUdMQQSTH42imCQsH/Y9mxP1oc/OSrwArLlFDGIQ 4QUqzaBczChoM1gkS/jRFkQNT4k1cRnbNrzVB++Jt49i9UU6b1TMVlps/r86/0Uq4hpM Z8MQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id e17-20020a1709067e1100b00a1c5b249547si4637139ejr.369.2023.12.12.13.03.01; Tue, 12 Dec 2023 13:03:04 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D92EE68D06C; Tue, 12 Dec 2023 23:02:48 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5084D68D0AC for ; Tue, 12 Dec 2023 23:02:41 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id E1DD9C00B6 for ; Tue, 12 Dec 2023 23:02:40 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 12 Dec 2023 23:02:40 +0200 Message-ID: <20231212210240.19886-2-remi@remlab.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231212210240.19886-1-remi@remlab.net> References: <20231212210240.19886-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/lpc: R-V V compute_autocorr X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: NL779+DIv2e4 The loop iterates over the length of the vector, not the order. This is to avoid reloading the same data for each lag value. However this means the loop only works if the maximum order is no larger than VLENB. The loop is roughly equivalent to: for (size_t j = 0; j < lag; j++) autoc[j] = 1.; while (len > lag) { for (ptrdiff_t j = 0; j < lag; j++) autoc[j] += data[j] * *data; data++; len--; } while (len > 0) { for (ptrdiff_t j = 0; j < len; j++) autoc[j] += data[j] * *data; data++; len--; } Since register pressure is only at 50%, it should be possible to implement the same loop for order up to 2xVLENB. But this is left for future work. Performance numbers are all over the place from ~1.25x to ~4x speedups, but at least they are always noticeably better than nothing. --- libavcodec/riscv/lpc_init.c | 8 +++++++- libavcodec/riscv/lpc_rvv.S | 29 +++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/lpc_init.c b/libavcodec/riscv/lpc_init.c index c16e5745f0..ab91956f2d 100644 --- a/libavcodec/riscv/lpc_init.c +++ b/libavcodec/riscv/lpc_init.c @@ -22,16 +22,22 @@ #include "libavutil/attributes.h" #include "libavutil/cpu.h" +#include "libavutil/riscv/cpu.h" #include "libavcodec/lpc.h" void ff_lpc_apply_welch_window_rvv(const int32_t *, ptrdiff_t, double *); +void ff_lpc_compute_autocorr_rvv(const double *, ptrdiff_t, int, double *); av_cold void ff_lpc_init_riscv(LPCContext *c) { #if HAVE_RVV && (__riscv_xlen >= 64) int flags = av_get_cpu_flags(); - if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR)) + if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR)) { c->lpc_apply_welch_window = ff_lpc_apply_welch_window_rvv; + + if (ff_get_rv_vlenb() >= c->max_order) + c->lpc_compute_autocorr = ff_lpc_compute_autocorr_rvv; + } #endif } diff --git a/libavcodec/riscv/lpc_rvv.S b/libavcodec/riscv/lpc_rvv.S index f81a2392c1..654156bf12 100644 --- a/libavcodec/riscv/lpc_rvv.S +++ b/libavcodec/riscv/lpc_rvv.S @@ -85,4 +85,33 @@ func ff_lpc_apply_welch_window_rvv, zve64d ret endfunc + +func ff_lpc_compute_autocorr_rvv, zve64d + li t0, 1 + vsetvli t1, a2, e64, m8, ta, ma + fcvt.d.l ft0, t0 + vle64.v v0, (a0) + sh3add a0, a2, a0 # data += lag + vfmv.v.f v16, ft0 + bge a2, a1, 2f +1: + vfmv.f.s ft0, v0 + fld ft1, (a0) # ft1 = data[lag + i] + vfmacc.vf v16, ft0, v0 # v16[j] += data[i] * data[i + j] + addi a1, a1, -1 + vfslide1down.vf v0, v0, ft1 + addi a0, a0, 8 + bgt a1, a2, 1b # while (len > lag); +2: + vfmv.f.s ft0, v0 + vsetvli zero, a1, e64, m8, tu, ma + vfmacc.vf v16, ft0, v0 + addi a1, a1, -1 + vslide1down.vx v0, v0, zero + bnez a1, 2b # while (len > 0); + + vsetvli zero, a2, e64, m8, ta, ma + vse64.v v16, (a3) + ret +endfunc #endif