From patchwork Sun Nov 12 13:42:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44635 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:92a5:b0:181:818d:5e7f with SMTP id q37csp795969pzg; Sun, 12 Nov 2023 05:42:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IHrGK5bWJSUQM3VVUhZenmuFDJHymRFJrW2J998XpEHq1FwxGc8S0WxVvQzKygn6u9rqWlD X-Received: by 2002:a17:906:dc9:b0:9bf:63b2:b6e2 with SMTP id p9-20020a1709060dc900b009bf63b2b6e2mr3011713eji.26.1699796556439; Sun, 12 Nov 2023 05:42:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699796556; cv=none; d=google.com; s=arc-20160816; b=HgXu2+uX8tZKFDbdaKUIEpjz4ASMFxpQWw7DsVdd/nSZb9fNz92WwWCkFhqlH3jkX4 TreVnP3ij46MybVpt47CW6neNfjs0hREyFbFZ/3jkfoYYYzQWCT5aekI7mKG9PIyu71R i3cx0vZLmCnQaUEcnEcUfdz37+oREPhk8UQOtNQGQtB1EpZx48XVbeTMOLbYUvHxoruo aF6iPb3SJx8zrdNUpt5J0JWoLxtvuwGz4+R74Hcb7BAKv0DJwl9sVNPICTifSv7MzYl/ gLLABNzGSY0ZCuwSyCE5PPaCdjWPTt7FULuO6YMNNASPe8vBwKSR8myvFOXp7RLJ0uu/ 88NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=gxzWW5o92kei4kRR7Q2ivp0WWkHXLdXXv1Jjeid3C8M=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=vl0mj2te1XT4wMJVqOhhH5hwjsruKOvBco5u+08RrTggnG5FWujK2+dpvt30zpbjdg S7tGR/yXAipSXvnXcc6p0acpgeYBmttL4uHm3SxG2QS29ebNd3AeQHmOGvmK26U8yymb uwPznoTfGmgst6b5HWtJM+UMOgohnc6xIBvIuGpRvT2JVhuuWOJL7929ZuEabiBu8Xqt pUyNxTXW/78ix3CSr6h+Blp42swgEolIbAwvylMES8MArv61b3VFHhf2enn/1NrRZt97 GpFW7UjXDS/P/XiX2Zcq2ZbkUWs9/mWt24LYupCjWgqDZNKFE81Slm+XRIOm1J//PQK3 LrpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d6-20020a170906344600b009d12cc58f2dsi1701333ejb.611.2023.11.12.05.42.36; Sun, 12 Nov 2023 05:42:36 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 22AAD68CC0C; Sun, 12 Nov 2023 15:42:34 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 89E6C68C077 for ; Sun, 12 Nov 2023 15:42:27 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 27AE9C006F for ; Sun, 12 Nov 2023 15:42:27 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Sun, 12 Nov 2023 15:42:26 +0200 Message-ID: <20231112134226.16864-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/1] lavc/huffyuvdsp: basic R-V V add_hfyu_left_pred_bgr32 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: yVSX60fBgPDK Better performance can probably be achieved with a more intricate unrolled loop, but this is a start: add_hfyu_left_pred_bgr32_c: 15084.0 add_hfyu_left_pred_bgr32_rvv_i32: 10280.2 This would actually be cleaner with the RISC-V P extension, but that is not ratified yet (I think?) and usually not supported if V is supported. --- libavcodec/riscv/huffyuvdsp_init.c | 6 +++++- libavcodec/riscv/huffyuvdsp_rvv.S | 16 ++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/huffyuvdsp_init.c b/libavcodec/riscv/huffyuvdsp_init.c index 115b25881c..b49b3dc097 100644 --- a/libavcodec/riscv/huffyuvdsp_init.c +++ b/libavcodec/riscv/huffyuvdsp_init.c @@ -24,6 +24,8 @@ #include "libavcodec/huffyuvdsp.h" void ff_add_int16_rvv(uint16_t *dst, const uint16_t *src, unsigned m, int w); +void ff_add_hfyu_left_pred_bgr32_rvv(uint8_t *dst, const uint8_t *src, + intptr_t w, uint8_t *left); av_cold void ff_huffyuvdsp_init_riscv(HuffYUVDSPContext *c, enum AVPixelFormat pix_fmt) @@ -31,7 +33,9 @@ av_cold void ff_huffyuvdsp_init_riscv(HuffYUVDSPContext *c, #if HAVE_RVV int flags = av_get_cpu_flags(); - if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) + if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { c->add_int16 = ff_add_int16_rvv; + c->add_hfyu_left_pred_bgr32 = ff_add_hfyu_left_pred_bgr32_rvv; + } #endif } diff --git a/libavcodec/riscv/huffyuvdsp_rvv.S b/libavcodec/riscv/huffyuvdsp_rvv.S index f8926fdaea..9c4434907d 100644 --- a/libavcodec/riscv/huffyuvdsp_rvv.S +++ b/libavcodec/riscv/huffyuvdsp_rvv.S @@ -35,3 +35,19 @@ func ff_add_int16_rvv, zve32x ret endfunc + +func ff_add_hfyu_left_pred_bgr32_rvv, zve32x + vsetivli zero, 4, e8, m1, ta, ma + vle8.v v8, (a3) + sh2add a2, a2, a1 +1: + vle8.v v0, (a1) + vadd.vv v8, v8, v0 + addi a1, a1, 4 + vse8.v v8, (a0) + addi a0, a0, 4 + bne a2, a1, 1b + + vse8.v v8, (a3) + ret +endfunc