From patchwork Sat May 4 14:48:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48488 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp424453pzb; Sat, 4 May 2024 07:49:49 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU2MBmd3HzY0TaTCnZEBXw7BziNG/mAnDbXZ0wYqOYDwgQe/mXIlSpLC6GOXfzdUmztxp34yiT6UpS0TIMZSN/9wpeFM0KnyZT2Og== X-Google-Smtp-Source: AGHT+IEoT3fyTS6aRtwmmEs+jr4/M69YregRI1pMjk30IrvIS2iWANPMu1LbcM1AoYo4tsob8XQ1 X-Received: by 2002:a17:907:728a:b0:a59:b17c:c9d2 with SMTP id dt10-20020a170907728a00b00a59b17cc9d2mr661953ejc.12.1714834189584; Sat, 04 May 2024 07:49:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714834189; cv=none; d=google.com; s=arc-20160816; b=jy3mM/KxRJwH9hgLb2PsvYHswJc8IM+6wASQp8Jn+cVpoLiNFwfTk+cKkFXDObXpbH Ou1k1zT1T0UCnJHqSu5idHbJDTlYafnWsvG9W2VvGaT/qO/Z6hKdglrU8trI3sz92Hnr x0YP7JqfvXkgAllOPq71oajq3vthC+PGSgnP1s+a9T9AhttWq1bXPsuQPsocyI9fKqK1 u7ocdPKQAV28/u4a7faBMvQV5zEG3Hu4BGEgjpzVT8XeEGJYJu/O8qnpjf3tvuio18wz +xPF1T8sVsjb710hL9CaHpk4oTsPj0lX1gyWTxMSTiPR+7GTxizywBTfxQ5qDcOnk0Im +RKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=N6YNY5/ubx1vTyQ6POSM2u/4Kwn/zyjOQN9rySeUR+A=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=PRc9VSyo9YdJuSADI4WDJse9ibGPmSLNkdHnOok8pCTdeZawhwcfNx7vvKIen1nuqc e/qmjWk4ltyAjk64DAMXg0R7OFVgtpyxbn38VH4ciZqlnEvzYUC9B7AOrYmAugzAxc29 oZR0yZQOkiHPktx7O+7GX/0kvERP02rWuiK9s+Wij1lTxQ0dL056P2It4qb5+jgUxVHo el/lbchlhzVe/JCKbklo75EhtX+SNhN2m1mbVT8XMaozoDDGA00TV8xp7uaHc3Q9jxZl yP+75JV6LI/W0/v9UEKQaAACvVM8XctmYbl8slPU9quG7phMfPAAlU48HEOdPq/f5VZ5 hmbg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=OAVdbqdK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id qf16-20020a1709077f1000b00a598c2959dcsi2106999ejc.203.2024.05.04.07.49.49; Sat, 04 May 2024 07:49:49 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=OAVdbqdK; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 677E268D6C9; Sat, 4 May 2024 17:49:23 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-59.mail.qq.com (out203-205-251-59.mail.qq.com [203.205.251.59]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 88CDF68D687 for ; Sat, 4 May 2024 17:49:13 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714834143; bh=rIQTr+JwQ1aTzsUTGwoplFa9438SAQhzIWMdJE2A88Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=OAVdbqdKI115F6FZk3oEkTf/4Q375Xswi3Vzu5JjO0MedkvitmPSm/K/s8aYpfjzu h0LyefF5tN5S0JIo/q0o9gIBKSoYdJNhm9su4Yj8pf6eMEBp5eFe4yOAey5nVl5DMM EBaKjPLmgYXYESabDLM9QNoLw5dvzbLoh4aTzhoI= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id C411502C; Sat, 04 May 2024 22:49:01 +0800 X-QQ-mid: xmsmtpt1714834142t796xfrdp Message-ID: X-QQ-XMAILINFO: MR/iVh5QLeie1KwtM98ZnjPN8yabLOhfezuScKOM92pbyMuXoo1c2orAiMvpHu ibtUjqYH7BU1AuP6l26IrN+DhCLaRTUzyT0J938XNIZnPeQ91mF5DGWdXuVXqFyG2cE4x467a/g7 xmQF9XxhDr1DnrGpLrDWEXNlF08Byb+R9ZF02MbUPcF390mDvhC2W0P6fxiWL5/4Zy6myr0u6zdJ onywNYV8YyCZ+1pzDDTwZmE0FuhOFczZ1S4tM17+eU+8d7lsMcb5QiLB+bwJhRlBonqIJJHsOiCP HhfDToB3rWwY9vvLfY2kjbzjQ3Li141DzdJIjHJcQQgJ4PGRsiLbVo0ekZh3uAvsfCCDh1aYcchC mrN5leu1jz82k23B1G0smrEOLtu0FcyqDffkcmTxx8WK92sCXbhyuuKBu56+ljdFiV3aLTeuFzot MX5WoisjM/yWmufVsNi88dQgtDlldk/6DcE+ObPJJbmTtML7MPC6QSAsZJEqGrbrBpbJISApBIFe fa47x7hfD+GtWs1mkdHAE+SHz58KqCVabvSCfpKO0w6FAIXK6r0P8skyZVohVgHE9NcqWLenPCkg nTq9oASkLjUL0emYHABUtViLUhix+Xxg06v1HF+2spIaCXTCz5viit/HCrX8Fyr8h1E90cYK7v7b zJKunSklh7RuoBlxHtOhQFbbdPrzzLkcdCf4q01IV6ig1T2HZrMYuRvCBU6A3qwIfJLoCZeGASPc tIrwhE8J/buQ/sPvPkNUzRTpo1C1e3LasStA1ND2Oo8i9xq5q76OXcXGZIiiPTZbPzAOjOG57RKb y/NI1xAFFpQFghVXvdC6ZCNMm1CnyT85puIcBxfXEEsssuR/fJqpJKGzb55IyC7XP5bm9IIIWDA3 3QA7jwKxJ3m1yRVGSLQwDAaTsC/LP5OnaiLLkUozNJ8oInzy8ZwLrpukeYP5UPK955QCudfW+aRi eSnMUce/NurIq2O32f7D8Ja/fTqV1j4qNBF0JpZi8goUjz79yoKw== X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 22:48:31 +0800 X-OQ-MSGID: <20240504144840.2411603-2-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240504144840.2411603-1-uk7b@foxmail.com> References: <20240504144840.2411603-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 02/10] lavc/vp8dsp: R-V V put_bilin_h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: UFldj/Cp9mY9 From: sunyuechi C908: vp8_put_bilin4_h_c: 373.5 vp8_put_bilin4_h_rvv_i32: 158.7 vp8_put_bilin8_h_c: 1437.7 vp8_put_bilin8_h_rvv_i32: 318.7 vp8_put_bilin16_h_c: 2845.7 vp8_put_bilin16_h_rvv_i32: 374.7 --- libavcodec/riscv/vp8dsp_init.c | 11 +++++++ libavcodec/riscv/vp8dsp_rvv.S | 54 ++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index c364de3dc9..32cb4893a4 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -34,6 +34,10 @@ VP8_EPEL(16, rvv); VP8_EPEL(8, rvv); VP8_EPEL(4, rvv); +VP8_BILIN(16, rvv); +VP8_BILIN(8, rvv); +VP8_BILIN(4, rvv); + av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) { #if HAVE_RVV @@ -47,6 +51,13 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_bilinear_pixels_tab[0][0][0] = ff_put_vp8_pixels16_rvv; c->put_vp8_bilinear_pixels_tab[1][0][0] = ff_put_vp8_pixels8_rvv; c->put_vp8_bilinear_pixels_tab[2][0][0] = ff_put_vp8_pixels4_rvv; + + c->put_vp8_bilinear_pixels_tab[0][0][1] = ff_put_vp8_bilin16_h_rvv; + c->put_vp8_bilinear_pixels_tab[0][0][2] = ff_put_vp8_bilin16_h_rvv; + c->put_vp8_bilinear_pixels_tab[1][0][1] = ff_put_vp8_bilin8_h_rvv; + c->put_vp8_bilinear_pixels_tab[1][0][2] = ff_put_vp8_bilin8_h_rvv; + c->put_vp8_bilinear_pixels_tab[2][0][1] = ff_put_vp8_bilin4_h_rvv; + c->put_vp8_bilinear_pixels_tab[2][0][2] = ff_put_vp8_bilin4_h_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 063ab7110c..c8d265e516 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -98,3 +98,57 @@ func ff_put_vp8_pixels4_rvv, zve32x vsetivli zero, 4, e8, mf4, ta, ma put_vp8_pixels endfunc + +.macro bilin_h_load dst len +.ifc \len,4 + vsetivli zero, 5, e8, mf2, ta, ma +.elseif \len == 8 + vsetivli zero, 9, e8, m1, ta, ma +.else + vsetivli zero, 17, e8, m2, ta, ma +.endif + + vle8.v \dst, (a2) + vslide1down.vx v2, \dst, t5 + +.ifc \len,4 + vsetivli zero, 4, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, 8, e8, mf2, ta, ma +.else + vsetivli zero, 16, e8, m1, ta, ma +.endif + + vwmulu.vx v28, \dst, t1 + vwmaccu.vx v28, a5, v2 + vwaddu.wx v24, v28, t4 + vnsra.wi \dst, v24, 3 +.endm + +.macro put_vp8_bilin_h len + li t1, 8 + li t4, 4 + li t5, 1 + sub t1, t1, a5 +1: + addi a4, a4, -1 + bilin_h_load v0, \len + vse8.v v0, (a0) + add a2, a2, a3 + add a0, a0, a1 + bnez a4, 1b + + ret +.endm + +func ff_put_vp8_bilin16_h_rvv, zve32x + put_vp8_bilin_h 16 +endfunc + +func ff_put_vp8_bilin8_h_rvv, zve32x + put_vp8_bilin_h 8 +endfunc + +func ff_put_vp8_bilin4_h_rvv, zve32x + put_vp8_bilin_h 4 +endfunc