From patchwork Wed Sep 14 17:50:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 37915 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp1475786pzh; Wed, 14 Sep 2022 10:50:56 -0700 (PDT) X-Google-Smtp-Source: AA6agR7gZJ/ZyEQ7N0JmSPI9pg/xNkjeOLHP6Q4vVhea28+CyVab6Iuqp8uBpYAKZdeal8EpuUHE X-Received: by 2002:aa7:d6c7:0:b0:452:2604:ae8b with SMTP id x7-20020aa7d6c7000000b004522604ae8bmr10320199edr.94.1663177856800; Wed, 14 Sep 2022 10:50:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663177856; cv=none; d=google.com; s=arc-20160816; b=yOE9vCSksVypRUKQ26wu/y9aCGOVgPrHtwiOU6KBLIAZTtgyLqX5zHwfADq3ctE+7v 1kjPOzbqGU/lRT/m66vN1l0XulJZWO0LQKpyFdj3lssQacnUs7XHuYehQ9OlV/CJfcaR lUpupCaQI7HNlHRrHOJtWKWVK3RobbaPt8njegtEG5nweEh9Op6blwSbk2RRvq4mSfu/ nTkRHx8ksYBE7+lucjC7wbIBxt1PUeIeLecraP8G70jTNqFf4nc/mYE+K0QMMdwOELb1 cBMURBiLO/XERZPihj6whZza9+Y6/YysZYp+nSWg04wyHnsmUFceQRAhndUq8qeU6MTL Jz3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=kWroH1ToWN/2PTn2vzep/Qgh8b/AQEqhMnfzzSzgmzY=; b=lOzFYBOreVfBEzUjJjs0tE3w/Jj0/kpIiq/KOAq7LYtMrcQc+4GiYhCXeFM5tUQBZh eT+7vqPn4IVMMpetoxt1fSoi/uoovX61Bb0cldguixzBcM/51IN4hMA/6kA3espFJekE qqRmsb6/hFEbiul30rpQfY+Q55oD3YMr7ZGy3t2ogk8f3CVZZUBL9tqJpF6CbAWh3+hC TMntvDtiwvw9tnMVmtXjXdPMKVgQXykBVJcPsCfe9JS+DKs/+ZKgb7kiCsE2KuaOsyYh qvAgxqF5kn/CdPPWTN+JXU9J2MLZxX5RQPl0I5RzxgfwGIo20eOPTbv3xqOJ1SiipDNs b5Uw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sb12-20020a1709076d8c00b0073832e13344si11252987ejc.86.2022.09.14.10.50.56; Wed, 14 Sep 2022 10:50:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E212868BB78; Wed, 14 Sep 2022 20:50:39 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 62B0868BB5B for ; Wed, 14 Sep 2022 20:50:32 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 17298C00AF for ; Wed, 14 Sep 2022 20:50:32 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Wed, 14 Sep 2022 20:50:31 +0300 Message-Id: <20220914175031.162194-3-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <4768066.31r3eYUQgx@basile.remlab.net> References: <4768066.31r3eYUQgx@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/audiodsp: add RISC-V F float vector clip X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pVvkpqmZZ4br From: RĂ©mi Denis-Courmont RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarked on SiFive U74-MC: audiodsp.vector_clipf_c: 29551.5 audiodsp.vector_clipf_f: 17871.0 Also tried unrolling with 2 or 8 elements but it gets worse either way. --- libavcodec/audiodsp.c | 2 ++ libavcodec/audiodsp.h | 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/audiodsp_init.c | 31 +++++++++++++++++++++ libavcodec/riscv/audiodsp_rvf.S | 46 ++++++++++++++++++++++++++++++++ 5 files changed, 82 insertions(+) create mode 100644 libavcodec/riscv/Makefile create mode 100644 libavcodec/riscv/audiodsp_init.c create mode 100644 libavcodec/riscv/audiodsp_rvf.S diff --git a/libavcodec/audiodsp.c b/libavcodec/audiodsp.c index ff43e87dce..eba6e809fd 100644 --- a/libavcodec/audiodsp.c +++ b/libavcodec/audiodsp.c @@ -113,6 +113,8 @@ av_cold void ff_audiodsp_init(AudioDSPContext *c) ff_audiodsp_init_arm(c); #elif ARCH_PPC ff_audiodsp_init_ppc(c); +#elif ARCH_RISCV + ff_audiodsp_init_riscv(c); #elif ARCH_X86 ff_audiodsp_init_x86(c); #endif diff --git a/libavcodec/audiodsp.h b/libavcodec/audiodsp.h index aa6fa7898b..485b512839 100644 --- a/libavcodec/audiodsp.h +++ b/libavcodec/audiodsp.h @@ -55,6 +55,7 @@ typedef struct AudioDSPContext { void ff_audiodsp_init(AudioDSPContext *c); void ff_audiodsp_init_arm(AudioDSPContext *c); void ff_audiodsp_init_ppc(AudioDSPContext *c); +void ff_audiodsp_init_riscv(AudioDSPContext *c); void ff_audiodsp_init_x86(AudioDSPContext *c); #endif /* AVCODEC_AUDIODSP_H */ diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile new file mode 100644 index 0000000000..a1f67ed55b --- /dev/null +++ b/libavcodec/riscv/Makefile @@ -0,0 +1,2 @@ +OBJS += riscv/audiodsp_init.o \ + riscv/audiodsp_rvf.o diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c new file mode 100644 index 0000000000..7ffd7e8162 --- /dev/null +++ b/libavcodec/riscv/audiodsp_init.c @@ -0,0 +1,31 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavcodec/audiodsp.h" + +void ff_vector_clipf_rvf(float *dst, const float *src, int len, float min, float max); + +av_cold void ff_audiodsp_init_riscv(AudioDSPContext *c) +{ + int flags = av_get_cpu_flags(); + + if (flags & AV_CPU_FLAG_F) + c->vector_clipf = ff_vector_clipf_rvf; +} diff --git a/libavcodec/riscv/audiodsp_rvf.S b/libavcodec/riscv/audiodsp_rvf.S new file mode 100644 index 0000000000..148af96ea2 --- /dev/null +++ b/libavcodec/riscv/audiodsp_rvf.S @@ -0,0 +1,46 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/riscv/asm.S" + +func ff_vector_clipf_rvf, f +NOHWF fmv.w.x fa0, a3 +NOHWF fmv.w.v fa1, a4 +1: + flw ft0, (a1) + flw ft1, 4(a1) + fmax.s ft0, ft0, fa0 + flw ft2, 8(a1) + fmax.s ft1, ft1, fa0 + flw ft3, 12(a1) + fmax.s ft2, ft2, fa0 + addi a2, a2, -4 + fmax.s ft3, ft3, fa0 + addi a1, a1, 16 + fmin.s ft0, ft0, fa1 + fmin.s ft1, ft1, fa1 + fsw ft0, (a0) + fmin.s ft2, ft2, fa1 + fsw ft1, 4(a0) + fmin.s ft3, ft3, fa1 + fsw ft2, 8(a0) + fsw ft3, 12(a0) + addi a0, a0, 16 + bnez a2, 1b + ret +endfunc