From patchwork Sat May 4 14:48:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48493 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp424823pzb; Sat, 4 May 2024 07:50:35 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXwdnBazZtS1o0SVoZcGOdq70l/6C560OIzTehDxv6N71/6FEMiHZ2zvWhfJSqgYq+nZU+BYYuRjge8EaEmtZMeD+cVjpMarEZfOQ== X-Google-Smtp-Source: AGHT+IHUaxFrscGaC7WS0oJGSH1izCEDc2d17boHeMj3q9EPsafwlgCcTJll6v7QTiS9TEuUj7rJ X-Received: by 2002:a05:6402:5408:b0:572:6698:9258 with SMTP id ev8-20020a056402540800b0057266989258mr3394489edb.2.1714834235561; Sat, 04 May 2024 07:50:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714834235; cv=none; d=google.com; s=arc-20160816; b=KLTMzlSVwxy+udJ3Lq70Uwp4JX3+5Xd6QL9TVfYl0mYi9lGahgKE3u9lN8hLSSaRIU uhTvHgKFIK1F0j4yeSwXz8CEw2BrKyg03GyptYU1TbmvngtaaMJO9hIO6LqO6tCcQyhG WY8vvPkootk+E68LT78U1t5qKE+HvmpUybb/rKYuaTyWuuxNkGNoFuNlp5WsKhUTnZE9 ySZQTiyNXiKtU2xFjuJC4zX83SQ62UtThWBpBlDKcT8kjoiqT8AH1KtmBjK3kiH5sQHZ tr5an3p1DMOnVPqWCA5WIjaciMID77fyQEceegJPrB7qmRTK5m7PAKZSyYIT2G00bVSQ weIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=bvsDsOKNjck7w53hupT8K4QikUzh0GapvleZBST8pIg=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=rCkL74sgmX5UDbjo63qJTrjwtknNPTMCE28gfUdL2ozUXsUSTynPbdne45d3tQ7Nek JNdn3RwdSe7SCebHEUDndBMr7D6hl1jNIO14kWk+mcsKgCU9m0myeXVOyZAWVFwC9kKa gAQDmcwiba87D4JDUD8zdPxeDRKJ9XNt1tUyt9K5xOGc93GFDTqxad/MVLeWVp7VUMWR WbhVbmZhDchZLbLlbMAE0u6As7BY6FMFsSeky92ehD//aEgaygC34UEg9wKIEPo/W50c SMndKpqMLgEk/y+gaUh3vygUz/QCDF4tLuglRNi2iAjY/M/15+TH3kSoy6Zaw28cH/8K 5VEw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="RMfM/9Qs"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i23-20020aa7dd17000000b0056e48c6e11esi2629381edv.262.2024.05.04.07.50.35; Sat, 04 May 2024 07:50:35 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="RMfM/9Qs"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3EC9568D637; Sat, 4 May 2024 17:49:30 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-221.mail.qq.com (out203-205-221-221.mail.qq.com [203.205.221.221]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E8F1B68D685 for ; Sat, 4 May 2024 17:49:16 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714834147; bh=NJElXIw9Ke+xv7keHFaMgaKhIPbEDwzbgkX7Y3OnqW8=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RMfM/9Qs6CMxX7ihWzpA2vR3svhBvyRStuhemv82Eo3Ohm6AJ3YsxikLy9BhHuAUc SYX6czGZIjmL8heEGz6pVJmWr/z5JIbZXvVgh2ZqgjcIusU2UMMGRs5U1+kLYg3qDU UX1h+4MLP28ARtoU8HYeyt9o90eerBLXz2jweaNo= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id C411502C; Sat, 04 May 2024 22:49:01 +0800 X-QQ-mid: xmsmtpt1714834146t0wcbypbu Message-ID: X-QQ-XMAILINFO: N7h1OCCDntujvcllEQOOqIaNf5379QN8klgTuE41ymzJ+iHEXzX8R+27r5xhH4 a5BWXHKbpvEPfoGgfE6qiAozf1LFQovFp079WFReTzCRN79rw7VKj/VH7q/cfLjs/qlKcSpQFWkx RpTF0+s2MiFtS0oFde5hLIJcROlNxZOIkKrXiktxzs/9HKAhbZMZBiinOh29FBWFCXDRmQ7/TmWJ K3jH4v0oaxysZIJ1VUyclP32wOwhNuu0jWxHjvM5LHsWQ2MVKXtL6jMNbun/63SVAQB/oPAZobHc ysY15HXvQMltzc2QNWBR+sWjjI66YXr216qNGMaEQNDJyrHioXDEJBFCwNBKEvCh6jFCjvyO71OF Xj14HYP8p6vsaEld0zobmi9dO/heFXnt9q3DIvY5I8QoMc2FjpNqRMvTpjadgup6uas+E3FJFnn2 WkaXVf79UZQj7ISpCHG6lbIQ0lHfsI93Yy5E2469HiKcM8g3eGEr1xo5WIzM/cM0vFjQujkvq9sS mEEZALSr4fjVKDG8PdsSBVHD3wTm3FpSWHAiJCzh/zJCpxqgp/UUQMuO6g6OMqtm73muDUW36yJ5 DwzJygE0gCHA2DI6kvthCmukeEtqB1e1OgcOfJ2qNftEmKZUqFC9XOq34GsAwRlfxsDrziTKBX2/ ghyUkpobvhnycWR5IGEL+q7/8SALbn0yTRSfpfXwKzOuUca0r3b6m7eKgZG0ByQMp3Rqw8NoiF/I 4wthX8ap8+SzcZ59TXx4+vL+9r2DftghEEU8/fdfOamDbmb+D32yrNFKQ0pZJcJA9UrflRxza/Wu FTyJR9ipnhEgYDAeVJo9rbNGgD7LH6X1Zjq1FCNK7XhVncntfuSk+54IxUljuT+SSFP6ySMZuqaE JbHBcIASBXd9X62XvwsWrneaE807BW4yCAZWm0+sCT2T40JHLEjQGyYQ12/30CtfGfcX2h1lqkcb CzSLZsaalW5ZMay3NyiA== X-QQ-XMRINFO: NyFYKkN4Ny6FSmKK/uo/jdU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 22:48:36 +0800 X-OQ-MSGID: <20240504144840.2411603-7-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240504144840.2411603-1-uk7b@foxmail.com> References: <20240504144840.2411603-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 07/10] lavc/vp8dsp: R-V V put_epel hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: FZQt/scNtZQR From: sunyuechi C908: vp8_put_epel4_h4v4_c: 20.0 vp8_put_epel4_h4v4_rvv_i32: 11.0 vp8_put_epel4_h4v6_c: 25.2 vp8_put_epel4_h4v6_rvv_i32: 13.5 vp8_put_epel4_h6v4_c: 22.2 vp8_put_epel4_h6v4_rvv_i32: 14.5 vp8_put_epel4_h6v6_c: 29.0 vp8_put_epel4_h6v6_rvv_i32: 15.7 vp8_put_epel8_h4v4_c: 73.0 vp8_put_epel8_h4v4_rvv_i32: 22.2 vp8_put_epel8_h4v6_c: 90.5 vp8_put_epel8_h4v6_rvv_i32: 26.7 vp8_put_epel8_h6v4_c: 85.0 vp8_put_epel8_h6v4_rvv_i32: 27.2 vp8_put_epel8_h6v6_c: 104.7 vp8_put_epel8_h6v6_rvv_i32: 29.5 vp8_put_epel16_h4v4_c: 145.5 vp8_put_epel16_h4v4_rvv_i32: 26.5 vp8_put_epel16_h4v6_c: 190.7 vp8_put_epel16_h4v6_rvv_i32: 47.5 vp8_put_epel16_h6v4_c: 173.7 vp8_put_epel16_h6v4_rvv_i32: 33.2 vp8_put_epel16_h6v6_c: 222.2 vp8_put_epel16_h6v6_rvv_i32: 35.5 --- libavcodec/riscv/vp8dsp_init.c | 13 ++++ libavcodec/riscv/vp8dsp_rvv.S | 124 +++++++++++++++++++++++++++------ 2 files changed, 116 insertions(+), 21 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 2f123b67fe..2dd583d079 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -92,6 +92,19 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_epel_pixels_tab[0][1][0] = ff_put_vp8_epel16_v4_rvv; c->put_vp8_epel_pixels_tab[1][1][0] = ff_put_vp8_epel8_v4_rvv; c->put_vp8_epel_pixels_tab[2][1][0] = ff_put_vp8_epel4_v4_rvv; + + c->put_vp8_epel_pixels_tab[0][2][2] = ff_put_vp8_epel16_h6v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][2] = ff_put_vp8_epel8_h6v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][2] = ff_put_vp8_epel4_h6v6_rvv; + c->put_vp8_epel_pixels_tab[0][2][1] = ff_put_vp8_epel16_h4v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][1] = ff_put_vp8_epel8_h4v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][1] = ff_put_vp8_epel4_h4v6_rvv; + c->put_vp8_epel_pixels_tab[0][1][1] = ff_put_vp8_epel16_h4v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][1] = ff_put_vp8_epel8_h4v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][1] = ff_put_vp8_epel4_h4v4_rvv; + c->put_vp8_epel_pixels_tab[0][1][2] = ff_put_vp8_epel16_h6v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][2] = ff_put_vp8_epel8_h6v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][2] = ff_put_vp8_epel4_h6v4_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 440a965ddd..ba644f0f47 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -234,26 +234,26 @@ const subpel_filters .byte 0, -1, 12, 123, -6, 0 endconst -.macro epel_filter size type - lla t2, subpel_filters +.macro epel_filter size type regtype + lla \regtype\()2, subpel_filters .ifc \type,v - addi t0, a6, -1 + addi \regtype\()0, a6, -1 .elseif \type == h - addi t0, a5, -1 + addi \regtype\()0, a5, -1 .endif - li t1, 6 - mul t0, t0, t1 - add t0, t0, t2 + li \regtype\()1, 6 + mul \regtype\()0, \regtype\()0, \regtype\()1 + add \regtype\()0, \regtype\()0, \regtype\()2 .irp n 1,2,3,4 - lb t\n, \n(t0) + lb \regtype\n, \n(\regtype\()0) .endr .ifc \size,6 - lb t5, 5(t0) - lb t0, (t0) + lb \regtype\()5, 5(\regtype\()0) + lb \regtype\()0, (\regtype\()0) .endif .endm -.macro epel_load dst len size type +.macro epel_load dst len size type from_mem regtype .ifc \type,v mv a5, a3 .else @@ -267,19 +267,29 @@ endconst vle8.v v26, (a7) add a7, a7, a5 vle8.v v28, (a7) - vwmulu.vx v16, v24, t2 - vwmulu.vx v20, v26, t3 + vwmulu.vx v16, v24, \regtype\()2 + vwmulu.vx v20, v26, \regtype\()3 .ifc \size,6 sub t6, t6, a5 add a7, a7, a5 vle8.v v24, (t6) vle8.v v26, (a7) - vwmaccu.vx v16, t0, v24 - vwmaccu.vx v16, t5, v26 + vwmaccu.vx v16, \regtype\()0, v24 + vwmaccu.vx v16, \regtype\()5, v26 +.endif + vwmaccsu.vx v16, \regtype\()1, v22 + vwmaccsu.vx v16, \regtype\()4, v28 +.else + vwmulu.vx v16, v4 , \regtype\()2 + vwmulu.vx v20, v6 , \regtype\()3 + .ifc \size,6 + vwmaccu.vx v16, \regtype\()0, v0 + vwmaccu.vx v16, \regtype\()5, v10 + .endif + vwmaccsu.vx v16, \regtype\()1, v2 + vwmaccsu.vx v16, \regtype\()4, v8 .endif li t6, 64 - vwmaccsu.vx v16, t1, v22 - vwmaccsu.vx v16, t4, v28 vwadd.wx v16, v16, t6 .ifc \len,4 @@ -303,13 +313,13 @@ endconst vnclipu.wi \dst, v24, 0 .endm -.macro epel_load_inc dst len size type - epel_load \dst \len \size \type +.macro epel_load_inc dst len size type from_mem regtype + epel_load \dst \len \size \type \from_mem \regtype add a2, a2, a3 .endm .macro epel len size type - epel_filter \size \type + epel_filter \size \type t .ifc \len,4 vsetivli zero, 4, e8, mf4, ta, ma @@ -321,10 +331,66 @@ endconst 1: addi a4, a4, -1 - epel_load_inc v30 \len \size \type + epel_load_inc v30 \len \size \type 1 t + vse8.v v30, (a0) + add a0, a0, a1 + bnez a4, 1b + + ret +.endm + +.macro epel_hv len hsize vsize + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd s\n, \n\()<<3(sp) + .endr + sub a2, a2, a3 + epel_filter \hsize h t + epel_filter \vsize v s +.ifc \len,4 + vsetivli zero, 4, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, 8, e8, mf2, ta, ma +.else + vsetivli zero, 16, e8, m1, ta, ma +.endif +.if \hsize == 6 || \vsize == 6 + sub a2, a2, a3 + epel_load_inc v0 \len \hsize h 1 t +.endif + epel_load_inc v2 \len \hsize h 1 t + epel_load_inc v4 \len \hsize h 1 t + epel_load_inc v6 \len \hsize h 1 t + epel_load_inc v8 \len \hsize h 1 t +.if \hsize == 6 || \vsize == 6 + epel_load_inc v10 \len \hsize h 1 t +.endif + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \vsize v 0 s vse8.v v30, (a0) +.if \hsize == 6 || \vsize == 6 + vmv.v.v v0, v2 +.endif + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 +.if \hsize == 6 || \vsize == 6 + vmv.v.v v8, v10 + epel_load_inc v10 \len \hsize h 1 t +.else + epel_load_inc v8 \len 4 h 1 t +.endif add a0, a0, a1 bnez a4, 1b + epel_load v30 \len \vsize v 0 s + vse8.v v30, (a0) + + .irp n 0,1,2,3,4,5 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 ret .endm @@ -345,4 +411,20 @@ endfunc func ff_put_vp8_epel\len\()_v4_rvv, zve32x epel \len 4 v endfunc + +func ff_put_vp8_epel\len\()_h6v6_rvv, zve32x + epel_hv \len 6 6 +endfunc + +func ff_put_vp8_epel\len\()_h4v4_rvv, zve32x + epel_hv \len 4 4 +endfunc + +func ff_put_vp8_epel\len\()_h6v4_rvv, zve32x + epel_hv \len 6 4 +endfunc + +func ff_put_vp8_epel\len\()_h4v6_rvv, zve32x + epel_hv \len 4 6 +endfunc .endr