From patchwork Sat May 4 10:01:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 48484 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:e68f:b0:1af:836d:81b3 with SMTP id mz15csp298826pzb; Sat, 4 May 2024 03:01:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVway0SLTQDvSgVJIrDCmgZVmKArG87qIHnDhG5TPvSa8u2BW4OMTzb9/NX4EsA9uMYgf/JyGlWmsR12w2YhWL5pKPQmI3vk5RIyQ== X-Google-Smtp-Source: AGHT+IEKf6hFElqhSIbCObCz3OP0MKT/jCSjAgq0ibFPon8ZCwKKJVM3ra5y2mFi7U7b57wFyWjl X-Received: by 2002:a17:907:6d0a:b0:a58:f183:3476 with SMTP id sa10-20020a1709076d0a00b00a58f1833476mr3655162ejc.0.1714816889141; Sat, 04 May 2024 03:01:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714816889; cv=none; d=google.com; s=arc-20160816; b=hUeZGCsS/bzOko0b4vkANBac5ZX90PYhWjhyuu8IOKv2cl44wLqNMuwHhQUX988WM0 P+IFc8zNCWnUHlnkhysJMyZQONpRmhEMR/o+PqcNiM/ggxOSK6DkgTvAdNUJd5pJxdPh S2CsvzhEC0XXcu6sBNOZJINu20L50KCA8+5b7WgelyVn8WRannq8pnszzM92iKMDYyRA Y9GPE8azUI1xaR7ahn+O7By/942oLD99zxzRGrWsHYB1/eC/ZoTrRNaArn9s5ETD9DRl po9xXWJfdE/Dc9ZdPMvrXet9R3Zi5QSqnbbYavI8glCFgZzxEYlfU7RDqhgxKHnYypV6 IaLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=u1Hz9BRdlyB2X7xdSnNQVBiA/b6iuvTRZSGaSUwcpYk=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=QBwuFMwUVTf2gSa0Q80PwWdrU/hL95tMXJHpvIVKs6Hurw/IwodiiefpPJecvOWbwv 2f5XdUOsFRKrAzfJ3PZtwOmd/ZkeQWNFKGgU5t8fAKXCqGlDSVxK8kpve6Cfx2fcTiro ztlLtABcTiXY20uM4I9f5J8bOvvcsTwQ4pks1FsxDazR0E1GQ8v4985g5UM3goyGZUxh 6Vk/av0q5QhSNTuGimH8gk0MOdhlBOoXTJLxS79g5BEXbgB4EnUuKpsHTsRLiHD1GwqX bNdS9EpBY0+/kEYYm3KL6mKvqNLJvvBHYjsyHHsKGhTb3bZMafLYgB5Ma8vcY0uGJApF eMZA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=MdGWZGgN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w2-20020a170906384200b00a51b4e36229si2382368ejc.501.2024.05.04.03.01.28; Sat, 04 May 2024 03:01:29 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=MdGWZGgN; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EB59168D6D6; Sat, 4 May 2024 13:01:24 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-221.mail.qq.com (out203-205-221-221.mail.qq.com [203.205.221.221]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 206E968D673 for ; Sat, 4 May 2024 13:01:16 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1714816868; bh=j1edhn2sTaCYYZDUACNsEBWAFyki/oqYqdvgQCCtGfM=; h=From:To:Cc:Subject:Date; b=MdGWZGgNYTG6eO1duVDS94IbKVxiT6JkDSHNJ6/2otT0npPgjaS1fFBD2d7WpSKdn +NxYg0COt/ozQeOcUr+xzRN1lpNUC6dYk0M2VZ7LErGySwvo/TdWis5hXMu/6eiFkF 4OCHqJYNCc4m53HkX85AjDKUl3hmLiDAcoLoXv+s= Received: from localhost.localdomain ([42.56.223.122]) by newxmesmtplogicsvrszb16-1.qq.com (NewEsmtp) with SMTP id 469D8F3; Sat, 04 May 2024 18:01:06 +0800 X-QQ-mid: xmsmtpt1714816866tjvbboel5 Message-ID: X-QQ-XMAILINFO: OOPJ7pYMv25tDQaCrdT+hKxNwCn2OP+xRxLvGIPubASCgUBW0+wknSfhf5xaxG PXD2+c9FphPOf3svPVNEtjTkTmeAxdVlsMD2L+QXrOaKjSf8acs1TiF5KNDEtUr3WoFvfOWULxCO w7+SJLwQGcVV9HN4j0mf2sXs7q61Q+vBKYv8NLm8yDYhJpdCpqsMWB5yb/Miq2fe33aB3SQZOrYh DBIVTu3Ly8UxnI0IspcKjtbBEfeH2crvF6hySmDhfHF+OGGDY2A4ceFchl3Fk/vDOaGK+kRxz1gK BEPO3i+smYDLgkVmuAJDhJ4xi4ma8lEZmOblxKiv01X7zE0eojWevE6XIpodpykHluiSERg643fA cXrUlsbS0c2wOMg8tDRu50j6VVOa/4zcxUPOe6eotnF/VtL/CXHeJx4yzsJLhLJH5fJZzSOdegxN 2cAnpZf64LCKLULaOkK/AceaL6xMaIsP+16IGL/RbJcf3NJcIniZEfOKovTepvTj00MM8NzngepG jD/mHZO+Q4kPQ/+55yfX24zqaNbBXHKLboyi1jMFtdvBhFTAbnai3yUR3U5+Z57nXu1woxRNRQ4i cpjQPehqzvc9CaL6tapYiVs3I79xxL3AvVV+T7Y6m/EYRBbJvTrcTn/2dpY+VnuOC1YTpo3M77lU ZKas/EQbaLH/aM/5Za8Z0tZZSxZj72T9YSViHa1GzCjjQZXfTKXZGuRVxdjM914ZgpxPCpWuizHN 4WaCuDlqZnWtK2ZbFOZXcRS66PuoP0Tw4dExhCIzWZE1YUaso+lR+TSxAO1RnZuMQwWRyIqTym6f rUcJgvA9peHXCf/nO4bpnuSZs2P5OQhqRrbuYJ388qbG6SlzHCuBs+hxvDRcIb7gdzMwHRuhRtmF 9NOZAv1SSi2s14q1Cms0q34U5WJvz/wCep6vyakF1cYbMiDoB6W3JiCY/6ePGmf7YCxS5Y/WaP2g GJ5xvYxBjX6YrMJyC4Hvtb36mV3VMFgDLNEPOjNh194dycpB8kvVKUn79Iy0HM X-QQ-XMRINFO: OWPUhxQsoeAVDbp3OJHYyFg= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Sat, 4 May 2024 18:01:05 +0800 X-OQ-MSGID: <20240504100105.948419-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/vc1dsp: R-V V mspel_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: T0w/xIObU3aT From: sunyuechi vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_c: 869.7 vc1dsp.avg_vc1_mspel_pixels_tab[0][0]_rvv_i32: 148.7 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_c: 220.5 vc1dsp.avg_vc1_mspel_pixels_tab[1][0]_rvv_i64: 56.2 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 523.7 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvv_i32: 82.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 138.5 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvv_i64: 23.7 --- libavcodec/riscv/vc1dsp_init.c | 8 +++++ libavcodec/riscv/vc1dsp_rvv.S | 66 ++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+) diff --git a/libavcodec/riscv/vc1dsp_init.c b/libavcodec/riscv/vc1dsp_init.c index e47b644f80..610c43a1a3 100644 --- a/libavcodec/riscv/vc1dsp_init.c +++ b/libavcodec/riscv/vc1dsp_init.c @@ -29,6 +29,10 @@ void ff_vc1_inv_trans_8x8_dc_rvv(uint8_t *dest, ptrdiff_t stride, int16_t *block void ff_vc1_inv_trans_4x8_dc_rvv(uint8_t *dest, ptrdiff_t stride, int16_t *block); void ff_vc1_inv_trans_8x4_dc_rvv(uint8_t *dest, ptrdiff_t stride, int16_t *block); void ff_vc1_inv_trans_4x4_dc_rvv(uint8_t *dest, ptrdiff_t stride, int16_t *block); +void ff_put_pixels16x16_rvv(uint8_t *dst, const uint8_t *src, ptrdiff_t line_size, int rnd); +void ff_put_pixels8x8_rvv(uint8_t *dst, const uint8_t *src, ptrdiff_t line_size, int rnd); +void ff_avg_pixels16x16_rvv(uint8_t *dst, const uint8_t *src, ptrdiff_t line_size, int rnd); +void ff_avg_pixels8x8_rvv(uint8_t *dst, const uint8_t *src, ptrdiff_t line_size, int rnd); av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp) { @@ -38,9 +42,13 @@ av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp) if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) { dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv; dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv; + dsp->put_vc1_mspel_pixels_tab[0][0] = ff_put_pixels16x16_rvv; + dsp->avg_vc1_mspel_pixels_tab[0][0] = ff_avg_pixels16x16_rvv; if (flags & AV_CPU_FLAG_RVV_I64) { dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv; dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv; + dsp->put_vc1_mspel_pixels_tab[1][0] = ff_put_pixels8x8_rvv; + dsp->avg_vc1_mspel_pixels_tab[1][0] = ff_avg_pixels8x8_rvv; } } #endif diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S index 4a00945ead..48244f91aa 100644 --- a/libavcodec/riscv/vc1dsp_rvv.S +++ b/libavcodec/riscv/vc1dsp_rvv.S @@ -111,3 +111,69 @@ func ff_vc1_inv_trans_4x4_dc_rvv, zve32x vsse32.v v0, (a0), a1 ret endfunc + +func ff_put_pixels16x16_rvv, zve32x + vsetivli zero, 16, e8, m1, ta, ma + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 + vle8.v v\n, (a1) + add a1, a1, a2 + .endr + vle8.v v31, (a1) + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 + vse8.v v\n, (a0) + add a0, a0, a2 + .endr + vse8.v v31, (a0) + + ret +endfunc + +func ff_put_pixels8x8_rvv, zve64x + vsetivli zero, 8, e8, mf2, ta, ma + vlse64.v v8, (a1), a2 + vsse64.v v8, (a0), a2 + + ret +endfunc + +func ff_avg_pixels16x16_rvv, zve32x + csrwi vxrm, 0 + vsetivli zero, 16, e8, m1, ta, ma + li t0, 128 + + .irp n 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 + vle8.v v\n, (a1) + add a1, a1, a2 + .endr + vle8.v v31, (a1) + .irp n 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 + vle8.v v\n, (a0) + add a0, a0, a2 + .endr + vle8.v v15, (a0) + vsetvli zero, t0, e8, m8, ta, ma + vaaddu.vv v0, v0, v16 + vaaddu.vv v8, v8, v24 + vsetivli zero, 16, e8, m1, ta, ma + .irp n 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 + vse8.v v\n, (a0) + sub a0, a0, a2 + .endr + vse8.v v0, (a0) + + ret +endfunc + +func ff_avg_pixels8x8_rvv, zve64x + csrwi vxrm, 0 + li t0, 64 + vsetivli zero, 8, e8, mf2, ta, ma + vlse64.v v16, (a1), a2 + vlse64.v v8, (a0), a2 + vsetvli zero, t0, e8, m4, ta, ma + vaaddu.vv v16, v16, v8 + vsetivli zero, 8, e8, mf2, ta, ma + vsse64.v v16, (a0), a2 + + ret +endfunc