From patchwork Sun Sep 4 13:54:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 34804 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:139a:b0:8f:1db5:eae2 with SMTP id w26csp2092980pzh; Sun, 4 Sep 2022 06:54:47 -0700 (PDT) X-Google-Smtp-Source: AA6agR5bdIfjNGVtdocjYB0OR352QioSBVLW54qpa3TadADl3tdocv10QT/kDcXLj+fU+XDu4Y8V X-Received: by 2002:a05:6402:156:b0:440:b458:93df with SMTP id s22-20020a056402015600b00440b45893dfmr41564821edu.337.1662299687692; Sun, 04 Sep 2022 06:54:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662299687; cv=none; d=google.com; s=arc-20160816; b=sO7SlsJSiucSwrJrXseqs09HQTF+6567bmTXdH5zzDxuGAjBYrqFmUAQCnT1DqrVsU Tk569tDC7SejhMIn9LWJYHx7eqNL/JuVYNj701PooAnTOSPR6crU2fMRu8DOI/GnPfSH vN2kBkhXTzv+GMtgGVus0/3gGk3PedarSu1i0PEphKAFxn7jW9PC1CGfMa59M55uuIqV O7A76V9OdeUXS6zCHxEPPNGYZMeV/9/7xa5IhoT/kMK/ACrHuSCQm8LIiTLalcy9aufN SwC8NZKusfTfqLljN2rX3hUguelzPHCiwqRD9oV45ozqZ1jeqPFzHW7IdyN4k22zeUzQ jJGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:organization:message-id:date:to :from:delivered-to; bh=LYVddSEHf1MXLzIsJIHEI9M8MTu8DDLl7fAV7G1IRjw=; b=a8AX1HQ7Xro/juh6ibJF1Tphy315y21JhfKouEWO/84l+vt/tais9M7nwIQNKLlRhp VH5ETpgPdeyaxvM8Q+YTpQFt+/9W4vcMPdn9WQgluJmIQFOldwd0sQ6AFZoKCzM8HlDK NTsqdXhGp+3KKFoV7YnTmPECTwHX/S+eymKiy2u5hjiGgLnN7P+uBgNO/LzhPHcQNhWx 5rgZiS3aSDBtI6LorqaCsa6Ug1k71r7sHe0dEMcSazYMoDtNEq0Kby8aFiCwJGb9Glg4 bnoqemX3xg8OtEGQ1kSK367Zcs+zp2X0Ut6UrY0uCl47iE6Fm4PEl1oVDkm/WQmLYtxt N5RQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id jz17-20020a170906bb1100b007307fa2da78si5146177ejb.450.2022.09.04.06.54.46; Sun, 04 Sep 2022 06:54:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 59F5268BAA0; Sun, 4 Sep 2022 16:54:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 47D1368BA50 for ; Sun, 4 Sep 2022 16:54:36 +0300 (EEST) Received: from ursule.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 354CEC006F for ; Sun, 4 Sep 2022 16:54:35 +0300 (EEST) Received: from basile.remlab.net ([2001:14ba:a080:a501:23a6:ebae:8f2a:4d73]) by ursule.remlab.net with ESMTPSA id BPoOChuuFGOS+yAAwZXkwQ (envelope-from ) for ; Sun, 04 Sep 2022 16:54:35 +0300 From: =?iso-8859-1?q?R=E9mi?= Denis-Courmont To: ffmpeg-devel@ffmpeg.org Date: Sun, 04 Sep 2022 16:54:34 +0300 Message-ID: <3372981.QJadu78ljV@basile.remlab.net> Organization: Remlab MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: WTaXdMAnN7DO The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe: avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 +0200) are waiting thorough bashing at your express convenience up to: riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300) Changes since v1: - Removed stray define. - Fixed mismatch between byte and element size in mul-scalar. - Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window. - Added float butterfly and dot product. All operations are unrolled to the maximum group size (8), with the exception of overlap/add. The later seems to require a minimum of 6 vectors (maybe 5 by extremely careful ordering), so the group size is only 4. The pointer arithmetic could be slightly optimised with SH2ADD and SH3ADD instructions from the Zvba extension. This would require more conditional code, or requiring support for Zvba for probably neglible performance gains though. ---------------------------------------------------------------- RĂ©mi Denis-Courmont (10): riscv: add CPU flags for the RISC-V Vector extension riscv: initial common header for assembler macros riscv: float vector-scalar multiplication with RVV riscv: float vector-vector multiplication with RVV riscv: float vector multiply-accumulate with RVV riscv: float vector multiplication-addition with RVV riscv: float vector sum-and-difference with RVV riscv: float reversed vector multiplication with RVV riscv: float vector windowed overlap/add with RVV riscv: float vector dot product with RVV libavutil/cpu.c | 14 +++ libavutil/cpu.h | 6 + libavutil/cpu_internal.h | 1 + libavutil/float_dsp.c | 2 + libavutil/float_dsp.h | 1 + libavutil/riscv/Makefile | 3 + libavutil/riscv/asm.S | 33 +++++ libavutil/riscv/cpu.c | 57 +++++++++ libavutil/riscv/float_dsp_init.c | 67 ++++++++++ libavutil/riscv/float_dsp_rvv.S | 255 +++++++++++++++++++++++++++++++++++++++ 10 files changed, 439 insertions(+) create mode 100644 libavutil/riscv/Makefile create mode 100644 libavutil/riscv/asm.S create mode 100644 libavutil/riscv/cpu.c create mode 100644 libavutil/riscv/float_dsp_init.c create mode 100644 libavutil/riscv/float_dsp_rvv.S