From patchwork Sat May 4 14:48:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48491 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp424693pzb; Sat, 4 May 2024 07:50:17 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWjS1lAy0Rv6ZRy07cVHlVWay45A9mt7JynlceQDSpKDpXvOFTqjdSUE+W6o1MbqG3lSQ/XIqH1dTrJAozQl7rC6QeBl0Y5K7NCEQ== X-Google-Smtp-Source: AGHT+IH2kM4c8vjXrt+7EWAAuKpu1AI5XgjUi0ecHeERtE4Dh/whz7VYgbGAhF2/wFhu6HeUDYaR X-Received: by 2002:a50:aa95:0:b0:568:cdd8:cf60 with SMTP id q21-20020a50aa95000000b00568cdd8cf60mr4697534edc.8.1714834217400; Sat, 04 May 2024 07:50:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714834217; cv=none; d=google.com; s=arc-20160816; b=uZRcrQpxlY/2+IjU+1zALnKoPkSDw6DDGL/tFBZo5IP2npUmMWakAQrqFXFKVcUIB6 M2nX344zoLPMdOHRQv8peHyMG0UM3NnT2ghIq5l5RY3+QAa7EF42BcJ+F4p0JP77h6WA 8dsG8NVBAjCHotE3HdyxB031BDND0mIoXW4ITgGHXVyG8TTbgFvruqBhxa59ysrPDl0r KhZziBq95TYkx+bkOh9zkAaKhQGI4IJfPh+/CIPnrs8wlpuzqFHJrlI8H8V8tKZH805h JJsw/1bywKW/04im8LQJrKlJkyMqSBpMmgkZlY3Pz1qF1JYy7uZqvWcGBCIsesC7S1VU Yv6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=tZF5JfjKMLPJ4OlEe5zKpVeS3c/fUaDoV11A0wTLaxU=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=vhPutYHpTu6UDfgtQr+iKuTTGenQOFfVb7DHS609nhqTAY4WgGKKOGnMewUr3q7ZOX z+6SRri/ht1QTwaSG215mnVaCykLbwZd1F+y8Vq3I9FNhzolsUDITrq3UOSCP1U9nrsk B8ctDp+C759YnKm3+xv6+v/8aRfmPEQgpmALu2DRFUYNPeYpl48b+/7oDMXzmHvp/k08 DiQLqiSikjkDhWIHxK9AsyJUV82CgzzwVIPnArawOJYVGKys7SzG90LpQw3nsTcBC2z9 GqsiQKClns3LVCzJZy6uJYbc40AAvtzsHXx6QG9W8U9rWu12tHifpVa0QtIrMGTJ5rPd 6NmQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=auYBuXBo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 19-20020a508753000000b00572988f2babsi2618272edv.382.2024.05.04.07.50.17; Sat, 04 May 2024 07:50:17 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=auYBuXBo; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 802D368D71A; Sat, 4 May 2024 17:49:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-245.mail.qq.com (out203-205-221-245.mail.qq.com [203.205.221.245]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id EFEA568D6B6 for ; Sat, 4 May 2024 17:49:15 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714834146; bh=J9SR0ShA4CtfbeWzQh+DsCly/bDk/8Uckznf6Oa4IDc=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=auYBuXBoH3AnyOh61cWbe0KcBazfRelcUS2bPRW+lriJI/rD58y3JtdS8uDUDEt45 pg9c6xpTSxVZ7Di5eWfIVosjI+NoZNCymRkC7sBBuYWs/G3/MHIvczxTNnOhPfHtWl Es2FrHv0azjAEKlWaTZ0fMH2lTUvu8WZcDofVoUE= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id C411502C; Sat, 04 May 2024 22:49:01 +0800 X-QQ-mid: xmsmtpt1714834146thaopxk0i Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCt4s+Q3HJ+VSgo47/xod9e5BAqLugjouDuxe9dFPzvaPT2Fnl/vL Q+7OkOiYjQbwPfU4ovogfoYzA1gkAYmsfq2aCweJa2RyVbkgHjPGdnz7L7/ZoWe/KZ+tUoEbU9gb +qc4KWQgcX/5cQp2fDzzBsTnEZdTn4hVM0zbd1eVoNZD9B45/cHbsE3VkVCqY1aijrjtuvxC/ZpN zlQE2sNLV7dlNUCLBVNPbuGdMWFRYtdMj7M8SamajurkN8IAaQmLuPlBjNOXENGj5zAReNJyGDNf L8WnpCW5RKldHfQWVLu7u0vgFB8/GfzyHeVwcj05Kej0tDejhR0aJaujvmeCNt3+sj42aJ13NSNf 12tGRcMxQwO12A21c0Ol+r7M8n5ZyYZifMAnklMNhg29DjHT11bHPAm1BRDiucFiQVnWbdVJiFnu AXt19tlg/W9T9O8lBANRlqh9j1kPaB/ypYqGoOUuQ70Lm3GJPp0CWLPl6Da+i19svsyP9W0FeKs1 yOGIZ1dsgKXAPZd5UCCjgmFA+x7dxmoFh6O2vmGnTNMVZXyNAt1Jo09dFWIF9v/JjL5OjtMmGW0t TLk2Il0YF84XFjSAv2fepojCU5KovsfjBQRYuOn8ORbaWBwBsvQ8egucaG05vKyyn5G0i4GoYPpj EeZW2WMkO4mIMP779idfyVmaQS8scPsfiXwQtyc4WEegQ4K86516VK4qIyt0agKV7f4QlQgoThaG 7wo//cs7tSDUuJGtjkMqiytr/3hPnvdilL2XEwfS+x8S3joMaDfbusO547m2ca7bZQU4baqdtXsF o7inNtKRCyLqY3fZGb4u/csVHqd1xBcv2CHtstOx5QtQJjLhj6S49GRhL/wV/piPjIu8aHS6j5Mc Qx9WgyTb73MoGzR1sWQ449m+hYvs2ZEmGek500TO3Pf9IyGWi/47+i031PBYNIxzz/8jKfccuEtv nRX5iHcN8jkuq2gmuL8Q== X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 22:48:35 +0800 X-OQ-MSGID: <20240504144840.2411603-6-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240504144840.2411603-1-uk7b@foxmail.com> References: <20240504144840.2411603-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 06/10] lavc/vp8dsp: R-V V put_epel v X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pUxlITJxKVG5 From: sunyuechi C908: vp8_put_epel4_v4_c: 11.0 vp8_put_epel4_v4_rvv_i32: 5.0 vp8_put_epel4_v6_c: 16.5 vp8_put_epel4_v6_rvv_i32: 6.2 vp8_put_epel8_v4_c: 43.7 vp8_put_epel8_v4_rvv_i32: 11.2 vp8_put_epel8_v6_c: 68.7 vp8_put_epel8_v6_rvv_i32: 13.2 vp8_put_epel16_v4_c: 92.5 vp8_put_epel16_v4_rvv_i32: 13.7 vp8_put_epel16_v6_c: 135.7 vp8_put_epel16_v6_rvv_i32: 16.5 --- libavcodec/riscv/vp8dsp_init.c | 7 ++++++ libavcodec/riscv/vp8dsp_rvv.S | 46 +++++++++++++++++++++++----------- 2 files changed, 39 insertions(+), 14 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 6614d661f7..2f123b67fe 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -85,6 +85,13 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_epel_pixels_tab[0][0][1] = ff_put_vp8_epel16_h4_rvv; c->put_vp8_epel_pixels_tab[1][0][1] = ff_put_vp8_epel8_h4_rvv; c->put_vp8_epel_pixels_tab[2][0][1] = ff_put_vp8_epel4_h4_rvv; + + c->put_vp8_epel_pixels_tab[0][2][0] = ff_put_vp8_epel16_v6_rvv; + c->put_vp8_epel_pixels_tab[1][2][0] = ff_put_vp8_epel8_v6_rvv; + c->put_vp8_epel_pixels_tab[2][2][0] = ff_put_vp8_epel4_v6_rvv; + c->put_vp8_epel_pixels_tab[0][1][0] = ff_put_vp8_epel16_v4_rvv; + c->put_vp8_epel_pixels_tab[1][1][0] = ff_put_vp8_epel8_v4_rvv; + c->put_vp8_epel_pixels_tab[2][1][0] = ff_put_vp8_epel4_v4_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 84e8ec61de..440a965ddd 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -234,9 +234,13 @@ const subpel_filters .byte 0, -1, 12, 123, -6, 0 endconst -.macro epel_filter size +.macro epel_filter size type lla t2, subpel_filters +.ifc \type,v + addi t0, a6, -1 +.elseif \type == h addi t0, a5, -1 +.endif li t1, 6 mul t0, t0, t1 add t0, t0, t2 @@ -249,19 +253,25 @@ endconst .endif .endm -.macro epel_load dst len size - addi t6, a2, -1 - addi a7, a2, 1 +.macro epel_load dst len size type +.ifc \type,v + mv a5, a3 +.else + li a5, 1 +.endif + sub t6, a2, a5 + add a7, a2, a5 +.if \from_mem vle8.v v24, (a2) vle8.v v22, (t6) vle8.v v26, (a7) - addi a7, a7, 1 + add a7, a7, a5 vle8.v v28, (a7) vwmulu.vx v16, v24, t2 vwmulu.vx v20, v26, t3 .ifc \size,6 - addi t6, t6, -1 - addi a7, a7, 1 + sub t6, t6, a5 + add a7, a7, a5 vle8.v v24, (t6) vle8.v v26, (a7) vwmaccu.vx v16, t0, v24 @@ -293,13 +303,13 @@ endconst vnclipu.wi \dst, v24, 0 .endm -.macro epel_load_inc dst len size - epel_load \dst \len \size +.macro epel_load_inc dst len size type + epel_load \dst \len \size \type add a2, a2, a3 .endm -.macro epel len size - epel_filter \size +.macro epel len size type + epel_filter \size \type .ifc \len,4 vsetivli zero, 4, e8, mf4, ta, ma @@ -311,7 +321,7 @@ endconst 1: addi a4, a4, -1 - epel_load_inc v30 \len \size + epel_load_inc v30 \len \size \type vse8.v v30, (a0) add a0, a0, a1 bnez a4, 1b @@ -321,10 +331,18 @@ endconst .irp len 16,8,4 func ff_put_vp8_epel\len\()_h6_rvv, zve32x - epel \len 6 + epel \len 6 h endfunc func ff_put_vp8_epel\len\()_h4_rvv, zve32x - epel \len 4 + epel \len 4 h +endfunc + +func ff_put_vp8_epel\len\()_v6_rvv, zve32x + epel \len 6 v +endfunc + +func ff_put_vp8_epel\len\()_v4_rvv, zve32x + epel \len 4 v endfunc .endr