From patchwork Mon Feb 26 16:19:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 46548 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c51b:b0:19e:cdac:8cce with SMTP id gm27csp1469394pzb; Mon, 26 Feb 2024 08:20:06 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUGPvtdF579KFthDj87iaIAfjqOWtZZNlqIPy+U4A6h9HksiCXPbRhE/a4r4t5TXJaL7tqvGG10YfTPqkAAtD4Q24rN8QEc9ExHfg== X-Google-Smtp-Source: AGHT+IHe6mTeUVTPa/OpqC788HDt0wya31dq4Nx5LmA0fOXyqDCEA+eO/WPpwQ+1E+H98ip7zXdp X-Received: by 2002:a05:6402:1845:b0:564:56e0:5643 with SMTP id v5-20020a056402184500b0056456e05643mr4966357edy.27.1708964406386; Mon, 26 Feb 2024 08:20:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708964406; cv=none; d=google.com; s=arc-20160816; b=lT0PtYiSIDb8DWDlWnSKvjrIKVRIRqfc0KJUkrtMMqh7ozzTVftSGrEziNKXcFpJG5 PLxpLSaWJMIfCz/58Ak1/OiDkp6tI/VQKzJfxfSBjyfuw24hCDz3yE/gJxRLLOqISeiQ /FM2H/eQctXX9mZkgMxSyK3ANLx0Uql5u1WzvVhdKDDSWn7QJFnwyzJk3a5m7qpFNLfw T1XMaInwIghgIIlESEtdErTZ5rQJIp4f9paI1T57jwJPkosXgmTB507JPLIDzUd81JNx Nq4WRS9STaGH0vqXmZn1s2IcsD5rMarcYFN4U7zBPmuFXEJeoGFWQJkASK67ZebkD0zM wGKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=iZzwczuEp0ZjA9BhC6cRcUDepSiiHIibfQZlHWjX1x0=; fh=e5zN9xSzcxLA6bGo3lF+CqTbY/oLwzApV03EO/RBfgQ=; b=tJsQ26PSgY7EyMAzr5RdtwWTdgPlTiCfNto23lpu0lorPWLDUlXWHnRoA6dmycD0C2 cd7TEY4cj/JL8d6zfLHpA9m6DOj+w+peCfI3FOLE8V3L0yztGE8xamsdDW58s0q39gjh ejnW4hs56d2/yfuz+GPxUyK1Mjl4Q6K2bJyRudjOPBhfsQpWFfpfKdTlC19WM/N39N5W X3lwLDK2kQoC23sveqF8GutJyb998vXOoEEoEo4Or7Ed1C51NFFATm+Wb9TvJhqYwUcU CXfprQSBOwBFlyQF5GkVlKKu6Q8H46EAPBbzoSk3vQCntZwmhfWMYxjBKGNB3q20agvb YpAg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Tl6ZkE4v; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h27-20020a50cddb000000b0056452273e35si2120761edj.381.2024.02.26.08.20.05; Mon, 26 Feb 2024 08:20:06 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=Tl6ZkE4v; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5C19868C81B; Mon, 26 Feb 2024 18:20:03 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5ACB068C7EA for ; Mon, 26 Feb 2024 18:19:57 +0200 (EET) Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-69016d2e703so2080026d6.1 for ; Mon, 26 Feb 2024 08:19:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708964396; x=1709569196; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=654nlozfex588lJPkT/uw6DPNWKqtgI0ObE54y83Kqs=; b=Tl6ZkE4vJKf6C3VJLKyVuvJPRox9HTp5rPqMB89Qhm6/n1K1Y4tusfopL0ZIi7+cGW 3FXBjhGVF/j7IeFByvxXW6qY6tjFXJvfklCBYn63YK8AhLxUxMfUGfpLJ35IUHa7uqaj YMKMhJMYftIaUwzWmhRoy7bC/qi5zKjC1E8tsJfancY/XU3Qxj4qQXsro65G5mWD7p0C BYInIB8OdWLnZ0werYh60EdJ1oB+6Ebrc82cruVOYAACOp6YpGVOynHjCmo5vUcmsTIs 2dZapWg35C3KpiboKWzs4YMrIGGFsV5W4zRGOcYFvjVR6uXj8yTyxzxbiUqkaH6AhHoU O9rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708964396; x=1709569196; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=654nlozfex588lJPkT/uw6DPNWKqtgI0ObE54y83Kqs=; b=mWD/OKxsqllFXFef3AExhokzDCqwnEy8gfX/lh0GIQemxTeUYH5O967iDbLs4ZKqiz 8oZc8p9pyaL12LVdLIeQDrMlRqmb20go3a9p/Jaei5TxMKvNpTntMbciqYMLPqOrZf3O AOxl8JTL8Z2uPhehQQw0nGH0az0/eHHZvh52OZmPaj0j1yjuvW6W40E5NAeBZph2sip6 FCSJK6CxiDkEcTJM8WdVfduQ+LSPBua2wBlDTsnW3mJW3ak2kuVIT0e5hKvwZCfe8Bud GxdzhOdn1ekd4vRxybIncv86yEUZl3TQsWHLQrRS6vW0yNzg1FZz1V2P3GMyys94MA9J rg7g== X-Gm-Message-State: AOJu0YzLS8qdjQw/xDOgrl6mOrmQ0mGeT6zHRQsbr0+mgg4My+oj5j2b 2RqpbTTKEf9WGs++4lY1hUsZaochAlQIR3d0gK2z+eKDRsbMExsz1bjscKUvpcP1ftxYG62j4Dn L/NVMYxcQvEVasig2IXilDjiTxxvJLAPoNSgBzg== X-Received: by 2002:a0c:db88:0:b0:68f:b9b7:9c47 with SMTP id m8-20020a0cdb88000000b0068fb9b79c47mr7917930qvk.29.1708964396049; Mon, 26 Feb 2024 08:19:56 -0800 (PST) MIME-Version: 1.0 From: flow gg Date: Tue, 27 Feb 2024 00:19:45 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: wysTbBK+d/+5 From 1a83f04530e3c299b28bd56dd10694aaa6b963d7 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 27 Feb 2024 00:07:08 +0800 Subject: [PATCH 3/3] lavc/vp9dsp: R-V V ipred dc dc_left dc_top C908: vp9_dc_16x16_8bpp_c: 117.0 vp9_dc_16x16_8bpp_rvv_i32: 81.7 vp9_dc_32x32_8bpp_c: 373.2 vp9_dc_32x32_8bpp_rvv_i32: 171.7 vp9_dc_left_16x16_8bpp_c: 101.2 vp9_dc_left_16x16_8bpp_rvv_i32: 76.7 vp9_dc_left_32x32_8bpp_c: 341.2 vp9_dc_left_32x32_8bpp_rvv_i32: 164.7 vp9_dc_top_16x16_8bpp_c: 101.0 vp9_dc_top_16x16_8bpp_rvv_i32: 76.7 vp9_dc_top_32x32_8bpp_c: 340.2 vp9_dc_top_32x32_8bpp_rvv_i32: 164.7 --- libavcodec/riscv/vp9dsp_init.c | 14 +++++ libavcodec/riscv/vp9dsp_rvv.S | 104 +++++++++++++++++++++++++++++++++ 2 files changed, 118 insertions(+) diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 5b68302235..65617ab21f 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -30,6 +30,13 @@ void ff_vp9_ipred_h_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, void ff_vp9_ipred_h_8x8_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); void ff_vp9_ipred_h_4x4_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_top_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_top_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_left_32x32_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); +void ff_vp9_ipred_dc_left_16x16_rvv(uint8_t *dst, ptrdiff_t stride, const uint8_t *l, const uint8_t *a); + av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) { #if HAVE_RVV @@ -43,6 +50,13 @@ av_cold void ff_vp9dsp_init_riscv(VP9DSPContext *dsp, int bpp, int bitexact) dsp->intra_pred[TX_16X16][HOR_PRED] = ff_vp9_ipred_h_16x16_rvv; dsp->intra_pred[TX_8X8][HOR_PRED] = ff_vp9_ipred_h_8x8_rvv; dsp->intra_pred[TX_4X4][HOR_PRED] = ff_vp9_ipred_h_4x4_rvv; + + dsp->intra_pred[TX_32X32][DC_PRED] = ff_vp9_ipred_dc_32x32_rvv; + dsp->intra_pred[TX_16X16][DC_PRED] = ff_vp9_ipred_dc_16x16_rvv; + dsp->intra_pred[TX_32X32][LEFT_DC_PRED] = ff_vp9_ipred_dc_left_32x32_rvv; + dsp->intra_pred[TX_16X16][LEFT_DC_PRED] = ff_vp9_ipred_dc_left_16x16_rvv; + dsp->intra_pred[TX_32X32][TOP_DC_PRED] = ff_vp9_ipred_dc_top_32x32_rvv; + dsp->intra_pred[TX_16X16][TOP_DC_PRED] = ff_vp9_ipred_dc_top_16x16_rvv; } } #endif diff --git a/libavcodec/riscv/vp9dsp_rvv.S b/libavcodec/riscv/vp9dsp_rvv.S index 578fbce061..e22bd943b4 100644 --- a/libavcodec/riscv/vp9dsp_rvv.S +++ b/libavcodec/riscv/vp9dsp_rvv.S @@ -127,3 +127,107 @@ func ff_vp9_ipred_h_4x4_rvv, zve32x ret endfunc + +.macro getdc type +.ifc \type,top + vle8.v v8, (a3) + vwredsumu.vs v16, v8, v16 +.elseif \type == left + vle8.v v8, (a2) + vwredsumu.vs v16, v8, v16 +.elseif \type == none + vle8.v v8, (a2) + vwredsumu.vs v16, v8, v16 + vle8.v v8, (a3) + vwredsumu.vs v16, v8, v16 +.endif + vsetivli zero, 1, e16, m1, ta, ma + vmv.x.s t1, v16 +.endm + +.macro dc32x32 type + vsetivli zero, 1, e16, m1, ta, ma + vmv.s.x v16, zero + + li t0, 32 + vsetvli zero, t0, e8, m2, ta, ma + getdc \type + +.ifc \type,top + addi t1, t1, 16 + srai t1, t1, 5 +.elseif \type == left + addi t1, t1, 16 + srai t1, t1, 5 +.elseif \type == none + addi t1, t1, 32 + srai t1, t1, 6 +.endif + + vsetvli zero, t0, e8, m2, ta, ma + vmv.v.x v0, t1 + + vsetivli zero, 8, e8, mf2, ta, ma + .rept 31 + vse32.v v0, (a0) + add a0, a0, a1 + .endr + vse32.v v0, (a0) + + ret +.endm + +.macro dc16x16 type + vsetivli zero, 1, e16, m1, ta, ma + vmv.s.x v16, zero + + vsetivli zero, 16, e8, m1, ta, ma + getdc \type + +.ifc \type,top + addi t1, t1, 8 + srai t1, t1, 4 +.elseif \type == left + addi t1, t1, 8 + srai t1, t1, 4 +.elseif \type == none + addi t1, t1, 16 + srai t1, t1, 5 +.endif + + vsetivli zero, 16, e8, m1, ta, ma + vmv.v.x v0, t1 + + vsetivli zero, 4, e8, mf4, ta, ma + .rept 15 + vse32.v v0, (a0) + add a0, a0, a1 + .endr + vse32.v v0, (a0) + + ret +.endm + +func ff_vp9_ipred_dc_32x32_rvv, zve32x + dc32x32 none +endfunc + +func ff_vp9_ipred_dc_16x16_rvv, zve32x + dc16x16 none +endfunc + +func ff_vp9_ipred_dc_left_32x32_rvv, zve32x + dc32x32 left +endfunc + +func ff_vp9_ipred_dc_left_16x16_rvv, zve32x + dc16x16 left +endfunc + +func ff_vp9_ipred_dc_top_32x32_rvv, zve32x + dc32x32 top +endfunc + +func ff_vp9_ipred_dc_top_16x16_rvv, zve32x + dc16x16 top +endfunc -- 2.44.0