From patchwork Fri Mar 22 06:01:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 47299 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3a4a:b0:1a3:31a3:7958 with SMTP id zu10csp1006456pzb; Thu, 21 Mar 2024 23:01:59 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXG8yAgiOJPDkS1CFUVT4QULqIyo3qeO8Ca8w00NLgJJidfbPfNHSJYIjdKEDZ9blpfqqPpW72OaF7MaHjZlgQM6IW02FPy3Rckpg== X-Google-Smtp-Source: AGHT+IE0t2FsIEVznNxaXM0r4KNCAe/Z/NyI04NZz7cayCKM1jGsVWbXzJ/PK6E3V5UlSfHM2pkS X-Received: by 2002:a50:d4c3:0:b0:568:d3b1:90cf with SMTP id e3-20020a50d4c3000000b00568d3b190cfmr756836edj.41.1711087319667; Thu, 21 Mar 2024 23:01:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711087319; cv=none; d=google.com; s=arc-20160816; b=euDwUYsn/jtDXCJazGB0IP6XxzdpZV3bs9bZPdzUvjpG3Umd4Tqq95hWtoJjKmyoeS 8IFt9jCHqJBhJXIBvzMBJkxCInUTSgn0qtVftPqw3K+4ZjAaPgn1H9pX6tsrEvAIz1d5 lJXTT7hG/igv5FEhjbQnPO9RPflHm9ChOs4vgvXRNUo59SJC4fKJcBoUIDAaJuU9OhgI 7nTpH4GVnQVkbZyIcJ7cie9u65i3u8QowG6T5CwMhaq99gchSQKHnrXGXR1ylMZdSoy8 5XPihPPnqZH+muzQycFnycEWT/BkgG0guRsKH8KINZAlQ+X2KdHIwKd9w3WZJHC4MR1G Ikeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=Scj96ii3M9z1gj5/69bjJkEvfChg37IHk2UyAHU1lbI=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=KzRJxlqZuPPJnJrku5r/tYCJ4ePgtqEIdIukpXXTTUy8hvz/qvhiTjskSJSIKVWEbM u0uCGZRjjqmywxFg1yDuBUFklP48d8dmVqVCBisArCoC5EaFLAE68zJENLd0NCsBKnIy OEXxxN71gX4o2x5ZF4q1wVQUsutbnCAK8EUodEYAZp350+n/OhyypQbDQxj6fIulaJ/f xRiPAEG06BDVyzI9gx5nJWAtWEggaAn1z/xVufTAQvNj3RO2ILL4e2HvgdS6wNn4cLRf Ppy8jC/685sWwBcVE9RCd08IlmOdQ3SOhGLpH8IIq6AMzUax9FNT42WQ59VN7Si2Rc32 NtBA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=fx0d9qrU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d24-20020aa7c1d8000000b00568beae376asi571952edp.586.2024.03.21.23.01.59; Thu, 21 Mar 2024 23:01:59 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=fx0d9qrU; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 49A7168D574; Fri, 22 Mar 2024 08:01:56 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EBA0368D4DB for ; Fri, 22 Mar 2024 08:01:49 +0200 (EET) Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-690caa6438aso10717256d6.0 for ; Thu, 21 Mar 2024 23:01:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711087308; x=1711692108; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=mgVmowAL2kIqtOvONmwy4ZNyWPfU9bYdBVnSXvHoSRM=; b=fx0d9qrUMOnDqtxS+JcwrcOOgsZAci35s+igaVuSB2+tL01S+Ht/HCf23w2AOQ4a28 TXz4HmDB2X+tJ0sJf9ysyTTWCbPxlkuFZi9Q4ld4hW4qRsxUiglNaUnmLkDshMfe675N RLiAyQraxNq3zLZ9MalprEhP3huMFGTlxGGcAaWHF0DeVbX+1z6Ad8XKtZxrb9O1d+N1 0X6ObYPLEfzZsjKTsos/HLUrNC1IpvNJTnyD29xKak9DD4S+N6zfO9nhYaqgrwyT7GWc xpjGJliMU8J3Cdz3UhtwV73iNMfCZyX+HYwwqUoB0fQ9zRi51LfCSVsFzPqGEv2lTZcO TO1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711087308; x=1711692108; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=mgVmowAL2kIqtOvONmwy4ZNyWPfU9bYdBVnSXvHoSRM=; b=TVAXMIX9NaaStUAfY5jWBGajnQqwBmVsz01ALXpe/OvP5uAaI0L/77xKeEXyigcxvW I+ga4yBgMLiM8Ay7G+jNRaGf0KHXekUSMJHwXQfnfrWvvwKEzbT+6KzgXsSC0eTtANaV x4oe5GfmL5jKlA+5RivVCYdeQMebJKgo/r63Ep/o2/bUG2aaQZZ0r0w9iRG/5tas9xt2 loL0Wx799/agGkYK55Oylzgr/Dk3PZIyrIDWme09OxKoBGhCMHS/i2S2iHiN8FB88RCd gPuLDjg7n+eNQ0IGLlsS8+2crSOziM0T7E6aq0NVwlnttvNXpYX7Vj4lwqQqcA0VKf50 2krg== X-Gm-Message-State: AOJu0YxV5mkW0VwByS3/SeJx2Bw53aZSoASsW/vHpkFQn/F/r//Hu7Po EFeafl6/isOqxVg9T4DA5+V6hsYwN8kzTKf5IgdJ7+/1PNlRl4OFCQOb5eXCnB/VbigCzlG2CpH Q1VNDKZ+sBou/UzMZT2Iy1wRVELyNn9QsELg= X-Received: by 2002:a05:6214:a47:b0:696:39b0:2c78 with SMTP id ee7-20020a0562140a4700b0069639b02c78mr1043975qvb.4.1711087308498; Thu, 21 Mar 2024 23:01:48 -0700 (PDT) MIME-Version: 1.0 From: flow gg Date: Fri, 22 Mar 2024 14:01:37 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: edUL2e55H+c7 From 278e473681eddaf24977e47c88f715620105c6b3 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Thu, 21 Mar 2024 17:50:58 +0800 Subject: [PATCH 3/3] lavc/vp8dsp: R-V V put_epel hv C908: vp8_put_epel4_h4v4_c: 20.0 vp8_put_epel4_h4v4_rvv_i32: 11.0 vp8_put_epel4_h4v6_c: 25.2 vp8_put_epel4_h4v6_rvv_i32: 13.5 vp8_put_epel4_h6v4_c: 22.2 vp8_put_epel4_h6v4_rvv_i32: 14.5 vp8_put_epel4_h6v6_c: 29.0 vp8_put_epel4_h6v6_rvv_i32: 15.7 vp8_put_epel8_h4v4_c: 73.0 vp8_put_epel8_h4v4_rvv_i32: 22.2 vp8_put_epel8_h4v6_c: 90.5 vp8_put_epel8_h4v6_rvv_i32: 26.7 vp8_put_epel8_h6v4_c: 85.0 vp8_put_epel8_h6v4_rvv_i32: 27.2 vp8_put_epel8_h6v6_c: 104.7 vp8_put_epel8_h6v6_rvv_i32: 29.5 vp8_put_epel16_h4v4_c: 145.5 vp8_put_epel16_h4v4_rvv_i32: 26.5 vp8_put_epel16_h4v6_c: 190.7 vp8_put_epel16_h4v6_rvv_i32: 47.5 vp8_put_epel16_h6v4_c: 173.7 vp8_put_epel16_h6v4_rvv_i32: 33.2 vp8_put_epel16_h6v6_c: 222.2 vp8_put_epel16_h6v6_rvv_i32: 35.5 --- libavcodec/riscv/vp8dsp_init.c | 13 ++++ libavcodec/riscv/vp8dsp_rvv.S | 125 +++++++++++++++++++++++++++------ 2 files changed, 117 insertions(+), 21 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 2f123b67fe..2dd583d079 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -92,6 +92,19 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_epel_pixels_tab[0][1][0] = ff_put_vp8_epel16_v4_rvv; c->put_vp8_epel_pixels_tab[1][1][0] = ff_put_vp8_epel8_v4_rvv; c->put_vp8_epel_pixels_tab[2][1][0] = ff_put_vp8_epel4_v4_rvv; + + c->put_vp8_epel_pixels_tab[0][2][2] = ff_put_vp8_epel16_h6v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][2] = ff_put_vp8_epel8_h6v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][2] = ff_put_vp8_epel4_h6v6_rvv; + c->put_vp8_epel_pixels_tab[0][2][1] = ff_put_vp8_epel16_h4v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][1] = ff_put_vp8_epel8_h4v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][1] = ff_put_vp8_epel4_h4v6_rvv; + c->put_vp8_epel_pixels_tab[0][1][1] = ff_put_vp8_epel16_h4v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][1] = ff_put_vp8_epel8_h4v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][1] = ff_put_vp8_epel4_h4v4_rvv; + c->put_vp8_epel_pixels_tab[0][1][2] = ff_put_vp8_epel16_h6v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][2] = ff_put_vp8_epel8_h6v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][2] = ff_put_vp8_epel4_h6v4_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 134154acfc..701557a808 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -233,26 +233,26 @@ subpel_filters: .byte 1, -8, 36, 108, -11, 2 .byte 0, -1, 12, 123, -6, 0 -.macro epel_filter size type - lla t2, subpel_filters +.macro epel_filter size type regtype + lla \regtype\()2, subpel_filters .ifc \type,v - addi t0, a6, -1 + addi \regtype\()0, a6, -1 .elseif \type == h - addi t0, a5, -1 + addi \regtype\()0, a5, -1 .endif - li t1, 6 - mul t0, t0, t1 - add t0, t0, t2 + li \regtype\()1, 6 + mul \regtype\()0, \regtype\()0, \regtype\()1 + add \regtype\()0, \regtype\()0, \regtype\()2 .irp n 1,2,3,4 - lb t\n, \n(t0) + lb \regtype\n, \n(\regtype\()0) .endr .ifc \size,6 - lb t5, 5(t0) - lb t0, (t0) + lb \regtype\()5, 5(\regtype\()0) + lb \regtype\()0, (\regtype\()0) .endif .endm -.macro epel_load dst len size type +.macro epel_load dst len size type from_mem regtype .ifc \type,v sub t6, a2, a3 add a7, a2, a3 @@ -260,6 +260,7 @@ subpel_filters: addi t6, a2, -1 addi a7, a2, 1 .endif +.if \from_mem vle8.v v24, (a2) vle8.v v22, (t6) vle8.v v26, (a7) @@ -269,8 +270,8 @@ subpel_filters: addi a7, a7, 1 .endif vle8.v v28, (a7) - vwmulu.vx v16, v24, t2 - vwmulu.vx v20, v26, t3 + vwmulu.vx v16, v24, \regtype\()2 + vwmulu.vx v20, v26, \regtype\()3 .ifc \size,6 .ifc \type,v sub t6, t6, a3 @@ -281,12 +282,22 @@ subpel_filters: .endif vle8.v v24, (t6) vle8.v v26, (a7) - vwmaccu.vx v16, t0, v24 - vwmaccu.vx v16, t5, v26 + vwmaccu.vx v16, \regtype\()0, v24 + vwmaccu.vx v16, \regtype\()5, v26 +.endif + vwmaccsu.vx v16, \regtype\()1, v22 + vwmaccsu.vx v16, \regtype\()4, v28 +.else + vwmulu.vx v16, v4 , \regtype\()2 + vwmulu.vx v20, v6 , \regtype\()3 + .ifc \size,6 + vwmaccu.vx v16, \regtype\()0, v0 + vwmaccu.vx v16, \regtype\()5, v10 + .endif + vwmaccsu.vx v16, \regtype\()1, v2 + vwmaccsu.vx v16, \regtype\()4, v8 .endif li t6, 64 - vwmaccsu.vx v16, t1, v22 - vwmaccsu.vx v16, t4, v28 vwadd.wx v16, v16, t6 .ifc \len,4 @@ -310,13 +321,13 @@ subpel_filters: vnclipu.wi \dst, v24, 0 .endm -.macro epel_load_inc dst len size type - epel_load \dst \len \size \type +.macro epel_load_inc dst len size type from_mem regtype + epel_load \dst \len \size \type \from_mem \regtype add a2, a2, a3 .endm .macro epel len size type - epel_filter \size \type + epel_filter \size \type t .ifc \len,4 vsetivli zero, 4, e8, mf4, ta, ma @@ -328,10 +339,66 @@ subpel_filters: 1: addi a4, a4, -1 - epel_load_inc v30 \len \size \type + epel_load_inc v30 \len \size \type 1 t + vse8.v v30, (a0) + add a0, a0, a1 + bnez a4, 1b + + ret +.endm + +.macro epel_hv len hsize vsize + addi sp, sp, -48 + .irp n 0,1,2,3,4,5 + sd s\n, \n\()<<3(sp) + .endr + sub a2, a2, a3 + epel_filter \hsize h t + epel_filter \vsize v s +.ifc \len,4 + vsetivli zero, 4, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, 8, e8, mf2, ta, ma +.else + vsetivli zero, 16, e8, m1, ta, ma +.endif +.if \hsize == 6 || \vsize == 6 + sub a2, a2, a3 + epel_load_inc v0 \len \hsize h 1 t +.endif + epel_load_inc v2 \len \hsize h 1 t + epel_load_inc v4 \len \hsize h 1 t + epel_load_inc v6 \len \hsize h 1 t + epel_load_inc v8 \len \hsize h 1 t +.if \hsize == 6 || \vsize == 6 + epel_load_inc v10 \len \hsize h 1 t +.endif + addi a4, a4, -1 +1: + addi a4, a4, -1 + epel_load v30 \len \vsize v 0 s vse8.v v30, (a0) +.if \hsize == 6 || \vsize == 6 + vmv.v.v v0, v2 +.endif + vmv.v.v v2, v4 + vmv.v.v v4, v6 + vmv.v.v v6, v8 +.if \hsize == 6 || \vsize == 6 + vmv.v.v v8, v10 + epel_load_inc v10 \len \hsize h 1 t +.else + epel_load_inc v8 \len 4 h 1 t +.endif add a0, a0, a1 bnez a4, 1b + epel_load v30 \len \vsize v 0 s + vse8.v v30, (a0) + + .irp n 0,1,2,3,4,5 + ld s\n, \n\()<<3(sp) + .endr + addi sp, sp, 48 ret .endm @@ -352,4 +419,20 @@ endfunc func ff_put_vp8_epel\len\()_v4_rvv, zve32x epel \len 4 v endfunc + +func ff_put_vp8_epel\len\()_h6v6_rvv, zve32x + epel_hv \len 6 6 +endfunc + +func ff_put_vp8_epel\len\()_h4v4_rvv, zve32x + epel_hv \len 4 4 +endfunc + +func ff_put_vp8_epel\len\()_h6v4_rvv, zve32x + epel_hv \len 6 4 +endfunc + +func ff_put_vp8_epel\len\()_h4v6_rvv, zve32x + epel_hv \len 4 6 +endfunc .endr -- 2.44.0