From patchwork Fri Mar 22 06:01:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 47298 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a4a:b0:1a3:31a3:7958 with SMTP id zu10csp1006321pzb; Thu, 21 Mar 2024 23:01:44 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX/lGnJMcld9UYglXvDfuypgwB5nVtdGejaYKE8z5jQMURjywG5F7PNjXBF3FTcF8+7YSBlsiWBqIT1X9Bxlw80upqAydUi4jbtXg== X-Google-Smtp-Source: AGHT+IHkdFeskfgocl2iimsd1Js3wbqBOySawOA178dHxj2/22GtYOce3rmOm1deZsB0s/pg9JnQ X-Received: by 2002:a17:906:7c4c:b0:a46:2a8c:b9f0 with SMTP id g12-20020a1709067c4c00b00a462a8cb9f0mr512442ejp.7.1711087303975; Thu, 21 Mar 2024 23:01:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711087303; cv=none; d=google.com; s=arc-20160816; b=kKjrB5+RDUmj93voNZuRGHf9wvO6yUIvj+dfByRuy5/8RdB4xvdkjrLUBrFYFVSDgJ ENMXC5GMVpgg0eLsHKEwb4wyzAQoaEg0WkdvBOnoFDcj47/57RfPAYBisCpiXRKwxspB PB3ADc0rLADA5tu2JMzDR2wrlUZ3kNsgJdp2aIHrPB7KdJ2x4Cs/Q62Qa7+T75fr9tlQ hwd5ag0bFHb6Z+CHt77lvYQjAV12SP4nm4iCkDZyx/39yqgaAekNMGWYDEt8A0Dw84uM bmAQ9XEjnQ6f2GhzSwejZf1ATn+q/8H3uKj4wSLfr3sVhR0REkCiBvqrr97plLXTKRw4 dTXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=U/Y1TXRxFDSD4uv4JeyIvMo1819NA+Cfb7mbBb/HGCY=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=0t8+qCg3menFGUgrJpw4doQwvR7STggmSm4OkEYTcvnZmqFW8q6Iidg0uWfx/VGje0 +PdO8KgwC9x78b4HXbmC7dcppH3mrz+9vwDlEefUKOuxgKeFJx4r+gZ/QxPisUA0HfRB hjHs5+NzzJb4AVlXr2ioNzd05MSqmc81jSsYhUcqzJeZrIxFHyTS7FJV2F34ldvSgsvT u5KEdGhK0Yq5FJ3szLEoc3Uj/aEMfccyQ6nVQRhOq9EY3lAXI0DW1WybkbCKZRmtn5Ys ZmWT9QioLHEbxUXV0DCxZzVUN7vidJfEzfF4PwAKQ7REQgEpZhz61tUg16hhDT282vJy xaNg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="F/P9W0JS"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r21-20020a170906c29500b00a4628c598afsi611769ejz.1048.2024.03.21.23.01.43; Thu, 21 Mar 2024 23:01:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="F/P9W0JS"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4645768D514; Fri, 22 Mar 2024 08:01:41 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B37F368C903 for ; Fri, 22 Mar 2024 08:01:34 +0200 (EET) Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6963cf14771so16482296d6.3 for ; Thu, 21 Mar 2024 23:01:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711087293; x=1711692093; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=xvbW5t5ZKlLl4pAdMYMzWNOZQC/KoGnf6f2bN9e9OD0=; b=F/P9W0JSYoQWowV7V5Ty6NGwewCs+jg2L90dp1rB961UFOP66oq/Cv8JUyZelrMPGx i6TZq+8NV37N3HYC+YkXTO9RbQGWEGtgJJa1BvuBzA0irb89zr8E0+bLZnOCY1r2XBkj B3zs8zNwOBz8jQYTCSUEhp6Ox1Z4Yn5Pm1cMKiNowK6xr2Ne5zop7Qo8PFCEqqdfVHGT JFzPS5pwsaKCadv89/K0ou+cADjKOtB4GMjCCfRpywkRdB85LzaJHrK67+FLnPPSG7zY YbKCl6mPPk2otID8uUwHlxsKZP0S6urd8sGEVe0/KQE05zmarJbFOIzFy7ciNfM7J2RP lc/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711087293; x=1711692093; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=xvbW5t5ZKlLl4pAdMYMzWNOZQC/KoGnf6f2bN9e9OD0=; b=IeAOuIkPW3JpLrhFv9okNeH6zth2joCFciehtpWcs9gftWEwAlnPrx6eozkJpE8AmS gu+8mj7OKk3vgsGV8+CWyekjJUiwqxeO+NwJJzQKGPIBS/+xEnbZdBrRLRlM9M3YKqD+ Jcm5wEw38+6ILd/QAzz0PRdvYWLtCdLkXVt1LBNpe2MV8s5lAiggHxRNDnsBh1PmQjqM Bz3t3t8k8yAt+AIbLXKb8ZAgWez1ffIT34+MqhN6YelXgnfY85E2eInJscutbv5t8Nlc GjQzHug/tJr8dK57PbWo8EnoVF3cf5KwM3P2k9jd5xj/ZNqiEGDrdeFsRGqI9iL6xqZq Gdug== X-Gm-Message-State: AOJu0YySZ3kFq2GDBtdqxhw5th3Q+Lan4HpDZPIkL04ndR9BoyxXgN33 Xhlkt8/1SpT+4i7YFa5nGZy6MynVPD62jiuaRbcHCcSvFNieErhrytQiN2O2W61p3UVnCogKT9j j9geR3EowPHSEL+uyQL+dcIRXMl0/BF2RbCA= X-Received: by 2002:a05:6214:40e:b0:68f:dddb:747 with SMTP id z14-20020a056214040e00b0068fdddb0747mr1168730qvx.58.1711087293172; Thu, 21 Mar 2024 23:01:33 -0700 (PDT) MIME-Version: 1.0 From: flow gg Date: Fri, 22 Mar 2024 14:01:21 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: i8/67XJEEGxL From a59509c554a319f8271ad4175da40788445f7a56 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 17:49:54 +0800 Subject: [PATCH 2/3] lavc/vp8dsp: R-V V put_epel v C908: vp8_put_epel4_v4_c: 11.0 vp8_put_epel4_v4_rvv_i32: 5.0 vp8_put_epel4_v6_c: 16.5 vp8_put_epel4_v6_rvv_i32: 6.2 vp8_put_epel8_v4_c: 43.7 vp8_put_epel8_v4_rvv_i32: 11.2 vp8_put_epel8_v6_c: 68.7 vp8_put_epel8_v6_rvv_i32: 13.2 vp8_put_epel16_v4_c: 92.5 vp8_put_epel16_v4_rvv_i32: 13.7 vp8_put_epel16_v6_c: 135.7 vp8_put_epel16_v6_rvv_i32: 16.5 --- libavcodec/riscv/vp8dsp_init.c | 7 ++++++ libavcodec/riscv/vp8dsp_rvv.S | 44 +++++++++++++++++++++++++++------- 2 files changed, 42 insertions(+), 9 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 6614d661f7..2f123b67fe 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -85,6 +85,13 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_epel_pixels_tab[0][0][1] = ff_put_vp8_epel16_h4_rvv; c->put_vp8_epel_pixels_tab[1][0][1] = ff_put_vp8_epel8_h4_rvv; c->put_vp8_epel_pixels_tab[2][0][1] = ff_put_vp8_epel4_h4_rvv; + + c->put_vp8_epel_pixels_tab[0][2][0] = ff_put_vp8_epel16_v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][0] = ff_put_vp8_epel8_v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][0] = ff_put_vp8_epel4_v6_rvv; + c->put_vp8_epel_pixels_tab[0][1][0] = ff_put_vp8_epel16_v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][0] = ff_put_vp8_epel8_v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][0] = ff_put_vp8_epel4_v4_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index a0dd46e3a8..134154acfc 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -233,9 +233,13 @@ subpel_filters: .byte 1, -8, 36, 108, -11, 2 .byte 0, -1, 12, 123, -6, 0 -.macro epel_filter size +.macro epel_filter size type lla t2, subpel_filters +.ifc \type,v + addi t0, a6, -1 +.elseif \type == h addi t0, a5, -1 +.endif li t1, 6 mul t0, t0, t1 add t0, t0, t2 @@ -248,19 +252,33 @@ subpel_filters: .endif .endm -.macro epel_load dst len size +.macro epel_load dst len size type +.ifc \type,v + sub t6, a2, a3 + add a7, a2, a3 +.elseif \type == h addi t6, a2, -1 addi a7, a2, 1 +.endif vle8.v v24, (a2) vle8.v v22, (t6) vle8.v v26, (a7) +.ifc \type,v + add a7, a7, a3 +.elseif \type == h addi a7, a7, 1 +.endif vle8.v v28, (a7) vwmulu.vx v16, v24, t2 vwmulu.vx v20, v26, t3 .ifc \size,6 +.ifc \type,v + sub t6, t6, a3 + add a7, a7, a3 +.elseif \type == h addi t6, t6, -1 addi a7, a7, 1 +.endif vle8.v v24, (t6) vle8.v v26, (a7) vwmaccu.vx v16, t0, v24 @@ -292,13 +310,13 @@ subpel_filters: vnclipu.wi \dst, v24, 0 .endm -.macro epel_load_inc dst len size - epel_load \dst \len \size +.macro epel_load_inc dst len size type + epel_load \dst \len \size \type add a2, a2, a3 .endm -.macro epel len size - epel_filter \size +.macro epel len size type + epel_filter \size \type .ifc \len,4 vsetivli zero, 4, e8, mf4, ta, ma @@ -310,7 +328,7 @@ subpel_filters: 1: addi a4, a4, -1 - epel_load_inc v30 \len \size + epel_load_inc v30 \len \size \type vse8.v v30, (a0) add a0, a0, a1 bnez a4, 1b @@ -320,10 +338,18 @@ subpel_filters: .irp len 16,8,4 func ff_put_vp8_epel\len\()_h6_rvv, zve32x - epel \len 6 + epel \len 6 h endfunc func ff_put_vp8_epel\len\()_h4_rvv, zve32x - epel \len 4 + epel \len 4 h +endfunc + +func ff_put_vp8_epel\len\()_v6_rvv, zve32x + epel \len 6 v +endfunc + +func ff_put_vp8_epel\len\()_v4_rvv, zve32x + epel \len 4 v endfunc .endr -- 2.44.0