From patchwork Tue Feb 6 15:56:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 46083 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:7b08:b0:19e:8a94:b663 with SMTP id s8csp1493536pzh; Tue, 6 Feb 2024 07:56:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IGyv07yyhMLnytSkPFqlmqI5VW7kdTV88eSSpU5QL9lBLuPxOK87KJD9uA7lU5ZPf+JQcyq X-Received: by 2002:a17:906:4b:b0:a37:cea1:256d with SMTP id 11-20020a170906004b00b00a37cea1256dmr2070201ejg.0.1707235013141; Tue, 06 Feb 2024 07:56:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707235013; cv=none; d=google.com; s=arc-20160816; b=Hm1CuEXagSSXbg0DmhS0a4Tyo2sA+r9L0Ay9O/LIFUBTzwSHxNABsKjal/9uB59+G/ s0FqzrixHQaCj93aQLjz9slzBKCZ8u0/izVKQhjQMgkebVdi/+6K0deIaAKPtYL+CU2O 545SNWLqwh5P8/XLu66M1c33lbVgvRt67FGFdSzQzXuUa/JNLFjHpLsM8OjtLmFNs2Ar BmQJQdkNha/XNRm82Kl5KmCmNbeR370v/8d2L8KRxKuTdUqJQsZjoJzyFtAf4v1iMesd ttjmQO5hn65uxH+CxrioKDPoHb8u6rCjr9k6k8Pby+BFe3EaWVNRTPgLp57gEvw2qakL KsrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=MNf1qGN6W37x6foldmSsl2pbrYfEZCvz9qSSrYZi3sY=; fh=WXmKIhJaPWDuZEIWXO848sRtIoZIYKIJr2273fMT4rw=; b=PdeJucuhHMMRS6HaQfOHcMycsLNKKY+J5xPuVckIjyjyvFBpkr8oU4lUzEkMMNM4g2 l47KAoHCS3iNWs4+QQkxd3bnL4YSFta97GfZXhu5fj0j0tFv1TvAlGRje1+JpJbFpBN4 8DzeY98JkYHPKEof9uKb83BG09uAt36UYRhDtYtVNOcDxrcLndlZZI8Obt+JxiDn9EvX 2kJ4FArsqlujY6oDmnqZL0CnZgyhVbC4Si33rQSwpcp4sE5nK+zAgeP4qZp0gLdZlQB/ 2KV1J7fd3wP5kHiSL9y4h2akma2cVqNTkpVR4QpezMFEfHCzfZIpvcKDsGg+fZAqOjdf T45w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=QWpoqAGJ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com X-Forwarded-Encrypted: i=0; AJvYcCU3Q6ZlylLQ4/Gc91CuI3+q6mIv82rKjysTXsudEPzOYtli+QtX/mrgJdWXxyMnmaiHWtCEIiB5AUZMLAX3Og1vOUu2/e6iMyTsMvv8uAK1GDGd2SWFnOmDTwQGpXdvowj5hdMxEYtMBwIAwyw2SXDR2I5ou8LfOoNhBHP1xgEDAbthmYJj4IT98fsECtdI9JCgEDsCTB7Dh/N44b1u9kUWwWfJSuIem2pSXg/aQEbDOlSLkCzAhNSRTBShxu0twRfYQG9HjjAyXY+eTrw7gxQ5fYWUgxmn0OiKFaWVplFojQBjtTzURs43Z8sBsfZIFGX7x++yL3DyJW7pDqddVwXILrUu+zzEM5cKSwIxrf5QRrC2T8gcgIzHsIUYK54OT4/luj/GZeAPX2YloFd5cm1dHcCBLk8+/fOP+Qqr9zahnYq4y4y3yI7yNXQlk3es5z5xXYg/x1djfLqWQRcNrhjM4jkRrS9FLDBK7G8anLFdHgcf7kByVoW1aFZhj2xfseYIUvmLFGsUiGLX1U1CrHjZ186SWYlxKRlmCSAaHYxEasr+FoTapiwPwRxK3nsDcFIkae87O2i048BpJlUwTP9oFo0kCKPCU08R3V9alFRz5o0Q6s1/kBG9hJvTBH66D02PEWQWmDqEphUR3uolGtc/6nnBIKchG+ALHbgaoL5Di9GHRG68Zca7WeGpEtxlsVd7gfr/CPFBI5iaUpm+gpNO1zMHIuXES20b2kimgMc7CVs/Jo9F7rGtXYBJ/lFiV5omVsDonyKT56rcmBupv9KTcuT7c+p/3BwycZor+A1FwBMj8lBpxsMaBVM3zw7LvCOy35YnoeREgy9892VJ0l1imAgDmJaFEjVPahsKDRTfGVLIcPg1TZ17LB0rPIkMsipP034NEkmibgZ5skezhqUefpSB07KSX7MZlB1OI5nUUVDUZpkKb3rpcgoXylII2RLxZb hKlfDcYsTgax1696wv7A04mcQ8n+Y8mTnPIXy8W/BKAggKHrEWV87Jq/yr6SEHahGt16JAECICx1kiFqGVuVnR0WOaSIth8GWoc73RKtUnsRmAFii8tJpNRXnSIpIVk+0rWTKqRaYH20eJ1nQetLAg8aLjoc2DWBjtuq/z6JCoDppi5j8dkxOLi1yl0WDdqpxwPLLt4x90rCyvk7lcZvxvPY2s0ZQ5EmUIDY1uCuGw3O+BBG8UVsoUOBFUNgZuDlvPkatZV5u27qvUStGHpI0Q0PCV3FLBpjYIGsISWbsDi2rsjmJ3n4RcgXSEZpjagrWFLlakK9abPWZ7+atEhLvuJ4spSqTvnfgHR95GgN37leJRtgEtK/T0EkvbFRcC8jkpZD3AXTlEz+9pN1+ZHSA6BXsVCks8wc/M6Ik85uHP9OiN4T+yL7QiK9Y8rcU1+HEGG4Fl7VCUV/16gd7jfbSljpUG16nf7CwRplzt1kBrVjrwCLmLY4+2B5cBYxvaeS65/fMu0fzzHRwJI7iyrV7ZnCGvMLQlwOUUkCwWI+kgsF2LCPKSiKGQITgpjMTIgLGtHt4paG1hEnAiFxcGt6AorqAS35AJsSPEto31aj64Rft+yRcjrzBKi7Pm1OZWjFbR5+w= Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d9-20020a1709067f0900b00a3613bb338fsi1178144ejr.227.2024.02.06.07.56.52; Tue, 06 Feb 2024 07:56:53 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=QWpoqAGJ; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4CFBE68D062; Tue, 6 Feb 2024 17:56:50 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B1BB768CC2B for ; Tue, 6 Feb 2024 17:56:44 +0200 (EET) Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-68c8d3c445fso13541056d6.1 for ; Tue, 06 Feb 2024 07:56:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707235003; x=1707839803; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=sHJWaQmFOUJMbBXGmJjAOGwDpEc9hY064z/Q++htK+k=; b=QWpoqAGJNeXVyfKKKpr0UlEwIX8ECH8s2qu0DDFWyt7/wdprC+MisaII0ZqKtbCc+O bG7vhVUDpVV5bRu5Sr6n2uQWspHmKpWUW39BYEp7vvxyTVF57k/Xdea4EC6OTis35GyG Kwzss5X7E8yil+7+qdPkXOXpwGAJeX8tjtRf55tPOMMrj7ZTPZyqq/95inaOQsv3H24F +LwKzilwacKZyZiGMkpBK0A4ApZsJnnJYZs9t0HTL3uwAEtMDvPcsare5e3Mqmz/wkVG Rp5xDLnZnG8IDajfBLfUmSWPxEalDh+vJYqoPpu15kGSQAuoTsnlSwGmKEbq7gVC97nV d80w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707235003; x=1707839803; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sHJWaQmFOUJMbBXGmJjAOGwDpEc9hY064z/Q++htK+k=; b=ILYn0CXr7/LwNrg0bNk04b1XaQ9A3LM/a8MgpDLw/epXACWdPmttr8fL8QtDpzDnJf 2XJ21bxNX+OYXW/QYrhWbMYKZ8tuwVS4h+EkCqFU8rkGYFtfMClwgtt/X6m0t2uWwASx OlIPlcv0yvHBdJfDngmqpcigqbE8bmMN/fbxywvEEhXBxE3ED48qPfSBzn1gk87cwsTH xwdoe+TUzkSctGUKaxwPjqeZx9WYcQtpglhgl6ITMm3l7NRncxWkbA7Xqku73v/pnm2R oQEy/ZuaX5A7ox4+pGYAzku/givu1dLu//xrRvCI0nBwHUIvCSsHgCyycUpCElkuSsWj sreQ== X-Gm-Message-State: AOJu0YybReH0sqdmV6kWIRwboXscy9D5h0FtOcznzr8FJ0/75WfbnPW8 uBbD5PL6SREuJxTSoPuU3+h4V5NoFUonjoXBdaiYtxEjBS03oZ6I6et6lo5h105BfKzbRuMNagW BYszIVicYn5yysA50KkNMKFpc3jMO4fEN X-Received: by 2002:a05:6214:27c2:b0:68c:8abb:73bc with SMTP id ge2-20020a05621427c200b0068c8abb73bcmr3266144qvb.61.1707235003401; Tue, 06 Feb 2024 07:56:43 -0800 (PST) MIME-Version: 1.0 From: flow gg Date: Tue, 6 Feb 2024 23:56:32 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: TLbIr5s+UwjX From 67f2a662be1533e52a28971152bff670f78544fd Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 23:18:51 +0800 Subject: [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad C908: vsad_0_c: 936.0 vsad_0_rvv_i32: 236.2 vsad_1_c: 424.0 vsad_1_rvv_i32: 190.2 vsse_0_c: 877.0 vsse_0_rvv_i32: 204.2 vsse_1_c: 439.0 vsse_1_rvv_i32: 140.2 --- libavcodec/riscv/me_cmp_init.c | 10 ++++ libavcodec/riscv/me_cmp_rvv.S | 98 ++++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+) diff --git a/libavcodec/riscv/me_cmp_init.c b/libavcodec/riscv/me_cmp_init.c index 85ecc22cbc..a6ef5addd0 100644 --- a/libavcodec/riscv/me_cmp_init.c +++ b/libavcodec/riscv/me_cmp_init.c @@ -46,6 +46,11 @@ int ff_sse8_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, int ff_sse4_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int ff_vsse16_rvv(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); +int ff_vsse8_rvv(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); +int ff_vsad16_rvv(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); +int ff_vsad8_rvv(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) { #if HAVE_RVV @@ -64,6 +69,11 @@ av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) c->sse[0] = ff_sse16_rvv; c->sse[1] = ff_sse8_rvv; c->sse[2] = ff_sse4_rvv; + + c->vsse[0] = ff_vsse16_rvv; + c->vsse[1] = ff_vsse8_rvv; + c->vsad[0] = ff_vsad16_rvv; + c->vsad[1] = ff_vsad8_rvv; } #endif } diff --git a/libavcodec/riscv/me_cmp_rvv.S b/libavcodec/riscv/me_cmp_rvv.S index 11848f3f21..25b15c74ce 100644 --- a/libavcodec/riscv/me_cmp_rvv.S +++ b/libavcodec/riscv/me_cmp_rvv.S @@ -231,3 +231,101 @@ func ff_sse4_rvv, zve32x vmv.x.s a0, v0 ret endfunc + +.macro vabsaddu dst src tmp + vneg.v \tmp, \src + vmax.vv \tmp, \src, \tmp + vwaddu.wv \dst, \dst, \tmp +.endm + +.macro vsad_vsse16 type + vsetivli t0, 16, e32, m4, ta, ma + addi a4, a4, -1 + add t1, a1, a3 + add t2, a2, a3 + vmv.v.x v24, zero + vmv.s.x v0, zero +1: + vsetvli zero, zero, e8, m1, tu, ma + vle8.v v4, (a1) + vle8.v v8, (t1) + vle8.v v12, (a2) + vle8.v v16, (t2) + addi a4, a4, -1 + vwsubu.vv v28, v4, v12 + vwsubu.wv v12, v28, v8 + vwaddu.wv v28, v12, v16 + vsetvli zero, zero, e16, m2, tu, ma + +.ifc \type,abs + vabsaddu v24, v28, v12 +.endif +.ifc \type,square + vwmacc.vv v24, v28, v28 +.endif + + add a1, a1, a3 + add a2, a2, a3 + add t1, t1, a3 + add t2, t2, a3 + bnez a4, 1b + + vsetvli zero, zero, e32, m4, tu, ma + vredsum.vs v0, v24, v0 + vmv.x.s a0, v0 + ret +.endm + +.macro vsad_vsse8 type + vsetivli t0, 8, e32, m2, ta, ma + addi a4, a4, -1 + add t1, a1, a3 + add t2, a2, a3 + vmv.v.x v24, zero + vmv.s.x v0, zero +1: + vsetvli zero, zero, e8, mf2, tu, ma + vle8.v v4, (a1) + vle8.v v8, (t1) + vle8.v v12, (a2) + vle8.v v16, (t2) + addi a4, a4, -1 + vwsubu.vv v28, v4, v12 + vwsubu.wv v12, v28, v8 + vwaddu.wv v28, v12, v16 + vsetvli zero, zero, e16, m1, tu, ma + +.ifc \type,abs + vabsaddu v24, v28, v12 +.endif +.ifc \type,square + vwmacc.vv v24, v28, v28 +.endif + + add a1, a1, a3 + add a2, a2, a3 + add t1, t1, a3 + add t2, t2, a3 + bnez a4, 1b + + vsetvli zero, zero, e32, m2, tu, ma + vredsum.vs v0, v24, v0 + vmv.x.s a0, v0 + ret +.endm + +func ff_vsse16_rvv, zve32x + vsad_vsse16 square +endfunc + +func ff_vsse8_rvv, zve32x + vsad_vsse8 square +endfunc + +func ff_vsad16_rvv, zve32x + vsad_vsse16 abs +endfunc + +func ff_vsad8_rvv, zve32x + vsad_vsse8 abs +endfunc -- 2.43.0