From patchwork Sat May 4 14:48:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48490 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp424614pzb; Sat, 4 May 2024 07:50:08 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVNiUhsS9doT6rUrafAZ6HDlnJZ2vdFYAhpOZ9NxDRvPAiyQlzoxUe6/v+bJ2in95ejU23WsfL33cPzj+RCnYuH9JxiLgDUeGWaXA== X-Google-Smtp-Source: AGHT+IER57QhEpa4Y7DqYNJLRLR9isF3GxqhvmQn+2gkNlWN5tKq2Hp3058AqbvHGau2bp+SvOqs X-Received: by 2002:a05:6402:f1c:b0:572:58d3:a6bf with SMTP id i28-20020a0564020f1c00b0057258d3a6bfmr4673879eda.2.1714834208568; Sat, 04 May 2024 07:50:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714834208; cv=none; d=google.com; s=arc-20160816; b=rssyOznF4BtKqP4MPqpKMEk2fH9olGeunURmaJbeSOhZsCFsQOG/OAvNsLl2tHP4UI J0jcKTG8tuJuzSJ9+02/9rCLk95qozj3GTt3vAL/6ahH5fDfCrZZOzX67nQrESKD7IDr KX7/j9X3Po+LEv1zilWDQBviFTQK9ganfycMZx3vbsUB7Ktn6W8tvOpF8t/AN2mKx3MA CmCuTgKXQIimfJP6WracBNxcm/nTZri+kd1wq2VWk+sm/YvwVBpizxF7xpuNiHRMyh7H CXHv8wZm8uZaHIuxQl694tAbiWzuwIU/4hUYcHx+THyuOlneNoOre4O5mYw/W43tOAJ2 j6dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=p7Y7cvteR6woPWJ0TJ9Pnxe5tNMAyuxxsZSPNAq14Ps=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=EmpQ2kmNfR2Sj5maCFUUt9nZmTfk+ZHlUs/0UQlnXT5DE2x9h4PIgYTBsJT4efgG5e zQBH/Qy7jkWAAPB8m7mZeyQUixe1S7U6EZJKNCtXlCQLIpSEWsS51od4HbjaDwo5RaoC pzpicaMPWcEse8WbVizUwY5u3rQlqQNt8sTp0DOV3YU0ZnMj4XXFdt7qEEaexE4+f1Us ZrA6aB0KSaqYxQQ5MCicFVeDECDxtMBGfrpHNx0AGi0CyJO1dD6L5tHBAralIF2TqEPa bXki9i818OPFQrw3uBhksI1nM1T66/aUGIFWd8ThzcTCpOLm95Ubc1gcqjYLo3ZlGpid lEQA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=p2h4GrYD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id cy7-20020a0564021c8700b005727b08830asi2753065edb.677.2024.05.04.07.50.08; Sat, 04 May 2024 07:50:08 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=p2h4GrYD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CB6BE68D6F3; Sat, 4 May 2024 17:49:25 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-205.mail.qq.com (out203-205-221-205.mail.qq.com [203.205.221.205]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7B5ED68D673 for ; Sat, 4 May 2024 17:49:14 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714834146; bh=8Tgh9dD64PLH1iQbJvdPlrz/jLfgLz/HVnrpgOil3MA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=p2h4GrYDqNuhtZMZgNe6FTL62FMRXISCp4tZhYGnTbtIGvZXNNajab2wdi42iEmRr wabe0ndq78McrkDdoS9YxGaFPBVpX2OE/JtecNnGT/0yn66MN2s2q8haEsKjRAAfxc r/29EtfW7KVawcdkgT/9+hYOQ6HKbWrJo4HF7P1M= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrsza29-0.qq.com (NewEsmtp) with SMTP id C411502C; Sat, 04 May 2024 22:49:01 +0800 X-QQ-mid: xmsmtpt1714834145tgc091cy4 Message-ID: X-QQ-XMAILINFO: NeHZ50d3l/WZO6KXUdqkf5C/HyqP4EfKyDgFkYBDAI4kRD/ABKqThGR3UpnxgT 67+Boc3NRmzLB3t9m6rUCm/51BCD6O4whlXlxTab3mU+9Fl90W1nUjd166qTA3nbMO0nxwfyuUpy aM2YxdxvNVMEmg76Mnm/DJHMCqkO90Fvna0baDLVp+9sU5nzkftZXr9GU/NKaJGbsn+fIVluMXnK ll2LR0A2JdANlQHkbZEfruP6PTi1ilm1afxMQfBsIX7I0R43wtMyLqiIiLuBFK3B1WvGmjNNZL+f oVA6BL7Sqwb8j6eNcQ4NDGBCFGaMiQRDk+3znWIlikqUD67wSZUT3yy973f4mBavYhOSSjH4pkrt v2Ru8+D2dZtVSJZD2jLCcqyv9PN82q3cmkLOqgJYHXXR5cyue/DxIFbWCcKJstDhGEl/tKSgjDqq xgKIBQsY1RrsFIgLtuPWJmZoT3tpBVlc9la8aBKKrkXpNTWjr0VsdKtFm3y4Q38Zbc1YZxYqvsoD 0a1GfVGltVIfjy7lO5CZxNOLsEZm1ozwdgKUGydoYZdhX8lM3y9ZjXchsdE3IOQKdE0Y31ZMzhWc dmTBj8837LvpRnl31jZc2zot7CWJ/zrmEjWKWzMWihRGo6A45Rhpo/k6nlOquCaIDwguxXGY8+IH GiasVJ6mBvgNQcf8O2hjH3XijUkPyxjKY6JI6xntdAU0cXzcCg+pwawh+tYoX71bgcXqAijQYxGH bsfWBPbWZpnjf4Kn7EWdLQypP3yodbK67BO6aW5bpA1dVcfYaO8oIGs3OW7xFi9fXl59KQdU42NH 9tlv6p0NrNtL6jvkkqpKF9Q1UINqbR5dORD99ipGw2+FeiXIuyvlw60MIGj4LIJZT080WWgM76MB i3cAiQ2srj+ofg4B7fd5pgGdJhTkeOT6ClKlZymyd7o+BB5tdDfcOYYc/iRfwGNA== X-QQ-XMRINFO: OD9hHCdaPRBwq3WW+NvGbIU= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 22:48:34 +0800 X-OQ-MSGID: <20240504144840.2411603-5-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240504144840.2411603-1-uk7b@foxmail.com> References: <20240504144840.2411603-1-uk7b@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 05/10] lavc/vp8dsp: R-V V put_epel h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 4oqHY0Zl+yrq From: sunyuechi C908: vp8_put_epel4_h4_c: 10.7 vp8_put_epel4_h4_rvv_i32: 5.0 vp8_put_epel4_h6_c: 15.0 vp8_put_epel4_h6_rvv_i32: 6.2 vp8_put_epel8_h4_c: 43.2 vp8_put_epel8_h4_rvv_i32: 11.2 vp8_put_epel8_h6_c: 57.5 vp8_put_epel8_h6_rvv_i32: 13.5 vp8_put_epel16_h4_c: 92.5 vp8_put_epel16_h4_rvv_i32: 13.7 vp8_put_epel16_h6_c: 139.0 vp8_put_epel16_h6_rvv_i32: 16.5 --- libavcodec/riscv/vp8dsp_init.c | 7 +++ libavcodec/riscv/vp8dsp_rvv.S | 105 +++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c index 02dbda979e..6614d661f7 100644 --- a/libavcodec/riscv/vp8dsp_init.c +++ b/libavcodec/riscv/vp8dsp_init.c @@ -78,6 +78,13 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) c->put_vp8_bilinear_pixels_tab[2][1][2] = ff_put_vp8_bilin4_hv_rvv; c->put_vp8_bilinear_pixels_tab[2][2][1] = ff_put_vp8_bilin4_hv_rvv; c->put_vp8_bilinear_pixels_tab[2][2][2] = ff_put_vp8_bilin4_hv_rvv; + + c->put_vp8_epel_pixels_tab[0][0][2] = ff_put_vp8_epel16_h6_rvv; + c->put_vp8_epel_pixels_tab[1][0][2] = ff_put_vp8_epel8_h6_rvv; + c->put_vp8_epel_pixels_tab[2][0][2] = ff_put_vp8_epel4_h6_rvv; + c->put_vp8_epel_pixels_tab[0][0][1] = ff_put_vp8_epel16_h4_rvv; + c->put_vp8_epel_pixels_tab[1][0][1] = ff_put_vp8_epel8_h4_rvv; + c->put_vp8_epel_pixels_tab[2][0][1] = ff_put_vp8_epel4_h4_rvv; } #endif } diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 9d4ffed255..84e8ec61de 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -223,3 +223,108 @@ endfunc func ff_put_vp8_bilin4_hv_rvv, zve32x put_vp8_bilin_hv 4 endfunc + +const subpel_filters + .byte 0, -6, 123, 12, -1, 0 + .byte 2, -11, 108, 36, -8, 1 + .byte 0, -9, 93, 50, -6, 0 + .byte 3, -16, 77, 77, -16, 3 + .byte 0, -6, 50, 93, -9, 0 + .byte 1, -8, 36, 108, -11, 2 + .byte 0, -1, 12, 123, -6, 0 +endconst + +.macro epel_filter size + lla t2, subpel_filters + addi t0, a5, -1 + li t1, 6 + mul t0, t0, t1 + add t0, t0, t2 + .irp n 1,2,3,4 + lb t\n, \n(t0) + .endr +.ifc \size,6 + lb t5, 5(t0) + lb t0, (t0) +.endif +.endm + +.macro epel_load dst len size + addi t6, a2, -1 + addi a7, a2, 1 + vle8.v v24, (a2) + vle8.v v22, (t6) + vle8.v v26, (a7) + addi a7, a7, 1 + vle8.v v28, (a7) + vwmulu.vx v16, v24, t2 + vwmulu.vx v20, v26, t3 +.ifc \size,6 + addi t6, t6, -1 + addi a7, a7, 1 + vle8.v v24, (t6) + vle8.v v26, (a7) + vwmaccu.vx v16, t0, v24 + vwmaccu.vx v16, t5, v26 +.endif + li t6, 64 + vwmaccsu.vx v16, t1, v22 + vwmaccsu.vx v16, t4, v28 + vwadd.wx v16, v16, t6 + +.ifc \len,4 + vsetvli zero, zero, e16, mf2, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e16, m1, ta, ma +.else + vsetvli zero, zero, e16, m2, ta, ma +.endif + + vwadd.vv v24, v16, v20 + vnsra.wi v24, v24, 7 + vmax.vx v24, v24, zero +.ifc \len,4 + vsetvli zero, zero, e8, mf4, ta, ma +.elseif \len == 8 + vsetvli zero, zero, e8, mf2, ta, ma +.else + vsetvli zero, zero, e8, m1, ta, ma +.endif + vnclipu.wi \dst, v24, 0 +.endm + +.macro epel_load_inc dst len size + epel_load \dst \len \size + add a2, a2, a3 +.endm + +.macro epel len size + epel_filter \size + +.ifc \len,4 + vsetivli zero, 4, e8, mf4, ta, ma +.elseif \len == 8 + vsetivli zero, 8, e8, mf2, ta, ma +.else + vsetivli zero, 16, e8, m1, ta, ma +.endif + +1: + addi a4, a4, -1 + epel_load_inc v30 \len \size + vse8.v v30, (a0) + add a0, a0, a1 + bnez a4, 1b + + ret +.endm + +.irp len 16,8,4 +func ff_put_vp8_epel\len\()_h6_rvv, zve32x + epel \len 6 +endfunc + +func ff_put_vp8_epel\len\()_h4_rvv, zve32x + epel \len 4 +endfunc +.endr