From patchwork Wed Oct 5 16:12:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38567 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4d9:b0:9c:f4b:4e41 with SMTP id 25csp700961pzd; Wed, 5 Oct 2022 09:13:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5SuWUCgDfNph9RKG6nIYODKRsZ0SEEtSq7jf4lu0APZ/oDL+MAtwXAldIuoER4KNMQjqls X-Received: by 2002:a17:907:7629:b0:776:a147:8524 with SMTP id jy9-20020a170907762900b00776a1478524mr290673ejc.632.1664986414000; Wed, 05 Oct 2022 09:13:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664986413; cv=none; d=google.com; s=arc-20160816; b=ActBtd/ARB4CuPG0tUseY0g+FRevJKgmDrT8zMljdf1WpPgO2nDtapoifQAqceXTC3 x/ZXC7GgVaWnQ6qQEXcNOERIWofmA6BClVE5/FfgI22ufaEywx0q5hOGNxgwTHXoAFWK jcNxtnIeY+DZyfllqVLHLh42zSodnAAId/JTmIIf5Aob1sxuqIxBYJQq76GR2YisD7uw BB6Zha9DpHfQh6VtHQ8cJIH8Q8pGki/KayazALXgQ9TkXWt20BN4OoutmFVSkBbeYntq ufPLHBHUzHZVthOG5XmUMqfyQ1feG/wdKlAe4IjYakmJORu2QimdnzWpbGJP5vruKhZ/ TE7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=lTZ0UgkTSYuyxb+xY6rwflC76Bx9IE8zsWlaAxxGTUU=; b=foj7iyqQbykpRKQmwdcUqAlG1unKdOQehdP5mcwI0DIfuJueufTzSG+4UleSeK0X3K h7N8aV7y0SzFQ+Fe54QJ3PPkfBuoEzpRz/Ng1Ug/Cxd23hsLmjjgEtmCPyfmK2YkRZ88 vDc4Yp9BO/sVDF3Lv9m/7zC/lW9ODAEEK7j72saVDWTpbmOpj/BVyExLSxRU6zfXNhCP cMNBy6UpIXW0vjqh8SdBjnrovmp+vY0lJwVGAKmzki1CduyC2PZgs4dt71VwT/SmamHZ iO2asHJwvLQ2rU6WUPh26wSdgbCRhoyL9C9cLQrt8jWqpmYpklRrXyKdQ3kxGyPyyHuH wCjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u11-20020a50a40b000000b00458485463desi14215163edb.606.2022.10.05.09.13.33; Wed, 05 Oct 2022 09:13:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D8FC068BD28; Wed, 5 Oct 2022 19:13:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 27A6068BD15 for ; Wed, 5 Oct 2022 19:12:58 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id C417BC00B0 for ; Wed, 5 Oct 2022 19:12:57 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Oct 2022 19:12:56 +0300 Message-Id: <20221005161256.27612-4-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12083658.O9o76ZdvQC@basile.remlab.net> References: <12083658.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/opusdsp: RISC-V V (512-bit) postfilter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gqVyEuepQ+S3 This adds a variant of the postfilter for use with 512-bit vectors. Half a vector is enough to perform the scalar product. Normally a whole vector would be used anyhow. Indeed fractional multiplers are no faster than the unit multipler. But in this particular function, a full vector makes up 16 samples, which would be loaded at each iteration of the outer loop. The minimum guaranteed CELT postfilter period is only 15. Accounting for the edges, we can only safely preload up to 13 samples. The fractional multipler is thus used to cap the selected vector length to a safe value of 8 elements or 256 bits. Likewise, we have the 1024-bit variant with the quarter multipler. In theory, a 2048-bit one would be possible with the eigth multipler, but that length is not even defined in the specifications as of yet, nor is it supported by any emulator - forget actual hardware. --- libavcodec/riscv/opusdsp_init.c | 8 ++++++++ libavcodec/riscv/opusdsp_rvv.S | 10 ++++++++++ 2 files changed, 18 insertions(+) diff --git a/libavcodec/riscv/opusdsp_init.c b/libavcodec/riscv/opusdsp_init.c index e6f9505f77..d564cca50c 100644 --- a/libavcodec/riscv/opusdsp_init.c +++ b/libavcodec/riscv/opusdsp_init.c @@ -27,6 +27,8 @@ void ff_opus_postfilter_rvv_128(float *data, int period, float *g, int len); void ff_opus_postfilter_rvv_256(float *data, int period, float *g, int len); +void ff_opus_postfilter_rvv_512(float *data, int period, float *g, int len); +void ff_opus_postfilter_rvv_1024(float *data, int period, float *g, int len); av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) { @@ -41,6 +43,12 @@ av_cold void ff_opus_dsp_init_riscv(OpusDSP *d) case 32: d->postfilter = ff_opus_postfilter_rvv_256; break; + case 64: + d->postfilter = ff_opus_postfilter_rvv_512; + break; + case 128: + d->postfilter = ff_opus_postfilter_rvv_512; + break; } #endif } diff --git a/libavcodec/riscv/opusdsp_rvv.S b/libavcodec/riscv/opusdsp_rvv.S index 243c9a5e52..b3d23a9de5 100644 --- a/libavcodec/riscv/opusdsp_rvv.S +++ b/libavcodec/riscv/opusdsp_rvv.S @@ -25,6 +25,16 @@ func ff_opus_postfilter_rvv_128, zve32f j 1f endfunc +func ff_opus_postfilter_rvv_512, zve32f + lvtypei a5, e32, mf2, ta, ma + j 1f +endfunc + +func ff_opus_postfilter_rvv_1024, zve32f + lvtypei a5, e32, mf4, ta, ma + j 1f +endfunc + func ff_opus_postfilter_rvv_256, zve32f lvtypei a5, e32, m1, ta, ma 1: