From patchwork Tue Feb 6 15:56:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 46085 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:7b08:b0:19e:8a94:b663 with SMTP id s8csp1493764pzh; Tue, 6 Feb 2024 07:57:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IFodr2IeQVi1PAixK+mpFiqDL2Ot+Ju0EP5dMG2CLtOtq5PwPJW4WX2P6f7w1FPtQ2HB5MH X-Received: by 2002:a17:906:18a2:b0:a38:4b66:4200 with SMTP id c2-20020a17090618a200b00a384b664200mr503574ejf.57.1707235041091; Tue, 06 Feb 2024 07:57:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707235041; cv=none; d=google.com; s=arc-20160816; b=Eb0vqJX6YxuwjG0+poZ5OF/vzV+OkoXi7kdClC0XJltkowWu0LB3ytYMqk7mKyI3uR tUznC+PyGUiLKupIjxITv6nUjnAow69BX54tvv9FwmZeKThMJI0QMh+8I8u1FLfIsmGp UoX5zaGJZzPYPbo41xEMPuUQa2bid36H0K9iADlGd8xS2/KdVhkdSvygIWoMd6k2qwep LGgNvELhYEfT/aANuDzwZHcvWTeU3NwUkkGb9EaOA+umcqcyb319rHhaZOcqGoPPxBkz rto87wgHU6xKYRlSQnyEXKx/TduvLd0zkNAyEpIAYYIk+Z7C6PPIAf0gZhqEDNgdTEM4 ncbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=Zw66EgrFbKf/GfSZKQtw0ygiQL0DbnfDJV7PbhY+ncM=; fh=mSOn0YDPfVdYVfXf9OoFTvbIr3duKKXIgxW2VcpUs1s=; b=QA2Og1zWemVvWkq79iD/3/F/aESh88UQwzPoDwtnqCXwyYAKr0XGC4Y3eSJeEynNtU X9kt3pNJoJb75tgDCwxO9RCtnrYnXdNfRCQ89RUzWrPRhenWVlRDMW+F1DsjvMRuz8lV yx6hMlb690RX2hzA3925NO2f152RNlia4Ggy/DmgA6K5XC/yDTGvofZgzY7CNYXz69f5 qDx60NOSGTXcsGIbrmcZ+c5dhHiBGo5z8jsUEvBmMBN/wcl6JJksvkmNGbLmQVmmJeoV dlmdViPqdHgyvh1QY1MYgTfnG1cKdfxIzy5KQtIF8GGQNtRlwdWJYWUC9luukXVS5CdJ P31w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="Y/5Lw6Bw"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com X-Forwarded-Encrypted: i=0; AJvYcCUFrlPnjLUbWvsolaiQ6iocws4ZwJ75iys6JMOjnh4tpuzRrpV09XQXT8X5/VTSj0o1L3cJTO941hL1FHG+7WCzKcUL0R1OW6KweK99qpJ1YSFsV3KzppWT8HfDLcaY5Oa3Jx5XV0oW91V1B9jfsV0y1Gg5RCLvHLl4mbrXQMMMMZ6m5c28ni2G5sKbYuTIY9NSXJJSb9OHvSdAMh/RYt+SwtY2Xp6u8l5phwTR+q6JDhbeKU8fiD06TvSwYW5hQtt+/MoW8mDU33tUNxr5imQLwVjJqLuQLD2sWRF+o2cgxUWO8h0W+dqxwLzjQvNpK5qIVSCgN7M9tGQAwTkXjjlzA+ZXgmEN66ffTM3SN0FBK7NzVFWwGfUpL7Xg8auTFYGfMBTlRYRWtlNeXI2BNGyUKbp+xaUJQay0rSDcCJHs989Suv9xpK0PrFAH5zKNI/erT3alYDhky6OA4jJafmCzLqsagUU/O/xwh50Z5AdrpYxoax9HbrIohX20WDXrD/5CD7tcDgRgqGuvu/ulFOoeR1I46d60382//OoI/WfE1RJWYcBqIxI+E1yRvGFz7ajhODF/BlNenWk8zaPxNra7Io6SOgqSHdLePdxz/3dwi/UEJcWzy9BxSc1EbxyN5YSu0Jsww+/TqsvuZq76r/4VoVuNVSXUrbIVoxOSBmUSFZXrriyWdmG6M65rWIGYm+LpuA0YlZ5vek0+kTTl65oVIlWrFw/MmC/qe2p3FYovueRAJJj+DAt5ETdd1Men/dKsLQ0+BxtWZCdYwV9+Iv6cJF7DLfRtAeNI2ty8MegNAGEOOjNAhycn5hZIulFZ+VZ9TCSEMtLJXeF6U4sRqf6DiLUxiGqlAGstrMiZ89SXWSyIxpSdXWxp/DEzPziIqkzViKmaf0QTsgxqm+BOMK67W4N/nNJM9NCU1XVICYhTnh7VsEV/66iq7fp0vqpjGJmLC5 BrmAJwvFt+qCvY6pd5tkjlh6qs1AV6Yp2FXKGOcH2RjXHlUQ76Y3VhBw8KleFKL2EovC0qcx7LMBWJvp95HmphsquXfp58/mEkz75ODbFySsrEkji5tobH52LnP5jgR+6VUXWLZ2jfMd5xqo3+bag907klrlMfTYfHkQ/r0Nd7kqWP0jM/oozouOT1W+KEpUw4GxSSw2YPegQQXB/X6Tt+YF00x+I+S4g7mqLeeAgjW2jgjoQOGK7kbUgHgaguKaXI5UntFN3m7d7/qNmM8Wlpbk+ANoC1OKyqxOhW7pGuaGA9cJFT6txtucDpF9FIg0uwXAuJN5YTHtGhBmuzBO3gU4MN1qdOZrX6Afhfgrcwm7RKa+muLeQKKcdcKJxapWvbApFEEUDZ+VLIPgLZfRqQ7UBck+tEUjqUUdl8CNF1xa6lH+OsncaLKc/V6wQfi1TOs9SLDwMGHACl6Qd2Cp3qs0optD0JWh6mh6qGTuqDXbiP/KQzQq9iAimQbPX7jZHVk2XhZSkOITXbBjkHiUlTm3ZgrP3VrRBNgtBYMASxRnAaqfdH+lcoWYUs7nhQVbK0w3rnBg/MTOTswtRxh+b8u3Wf+nZmOHIoHY3XlF43RQy6r5iZYMNmduI+nA4ChhD5CUo4 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u14-20020a170906c40e00b00a3794142672si1214597ejz.875.2024.02.06.07.57.20; Tue, 06 Feb 2024 07:57:21 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="Y/5Lw6Bw"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 22E8F68CAB8; Tue, 6 Feb 2024 17:57:18 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C7E2268CAB8 for ; Tue, 6 Feb 2024 17:57:11 +0200 (EET) Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7857947b179so183686485a.3 for ; Tue, 06 Feb 2024 07:57:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707235030; x=1707839830; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=ySjB/iaWJObX1OaTmijo5cumZjrlFnJjKHygtLRvN34=; b=Y/5Lw6Bw3YEgZ6luAt0SlDlmF9RtuyoC/FQODBhJkFlwzoA/Xfv96pwY8seU9emAoB ozSPm9Gp+xAlPvGej/az1adVwXWRIRSbZrtN6a05w0urinhoB/yEfBNLxjoglW/uQzpt ZrjsdL0vGWJIHux/4/jH2iZoFOVWr3kx+CZML5oe9k+vPPg0Boz+UPvgCYgyFF6ZUDXr kZzTVcylCCDSh9RgjSGiJLjINn+sI48uYlSwPNqU3LKxn0BhdOhsCtn8BVDyZRvi1myQ Uf7AIEUHvdYNGKBAQZABoLlnyd1T9wYXdBrMriX/7RKQ/09w7d3wLHu0lcjO1C0elxq/ pDkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707235030; x=1707839830; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ySjB/iaWJObX1OaTmijo5cumZjrlFnJjKHygtLRvN34=; b=Gf5XmSDO9vojRzcDVFH14NJOuUD3/0olzRSZKE6xyFuciqWY4CJxfvqKqxBaqtfIpN m2S9pvSALTeBH+MkA1p3y2DYzds3p1LrBVeKBgZYbQUk1ygLE59hmlm1v73E3B9p3d5u EW9U2y/sw51cf//aZewmuQph2tiJuTpJclh23uPvgzC9ni7r7j3jbjrz0z7npbV7YzI5 iUmkP15kXPCYZTcd6dnzNDeG4ii0XS3UDadJjswbwr8uMXXnD7FfpYb9dx3qNvag8pzc 25QeEFYfyRY0egsE3GjYJGYGDTydnMPpSNge/8YA2xi+WlDpYyOzXEDGp2fA3n7+Z475 dVBA== X-Gm-Message-State: AOJu0YzEcpx/rIC11BdjR5RKeAk7FVd98pNlZ+ONVX0tyNIvWlYakMqJ FAfObWpDw9d6DjJ6hBeEtQUjAOY+bJU25KEf5QukOsphIMVzeFSXa7FWnjLLHPETL7p6hENe4mF Lzy1Rf+UtRdPl66yrScCJNO5XA/Cpv0kR X-Received: by 2002:a05:6214:f27:b0:68c:50a8:9 with SMTP id iw7-20020a0562140f2700b0068c50a80009mr3582797qvb.44.1707235030337; Tue, 06 Feb 2024 07:57:10 -0800 (PST) MIME-Version: 1.0 From: flow gg Date: Tue, 6 Feb 2024 23:56:59 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 7/7] lavc/me_cmp: R-V V nsse X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: tdXRc7BgnKsu From 31635394e89318c554a9653bd22791336309951e Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 22:51:47 +0800 Subject: [PATCH 7/7] lavc/me_cmp: R-V V nsse C908: nsse_0_c: 1990.0 nsse_0_rvv_i32: 572.0 nsse_1_c: 910.0 nsse_1_rvv_i32: 456.0 --- libavcodec/riscv/me_cmp_init.c | 30 +++++++++ libavcodec/riscv/me_cmp_rvv.S | 118 +++++++++++++++++++++++++++++++++ 2 files changed, 148 insertions(+) diff --git a/libavcodec/riscv/me_cmp_init.c b/libavcodec/riscv/me_cmp_init.c index 24e78e3eeb..48c0d3d827 100644 --- a/libavcodec/riscv/me_cmp_init.c +++ b/libavcodec/riscv/me_cmp_init.c @@ -55,6 +55,15 @@ int ff_vsad8_rvv(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, ptrdif int ff_vsad_intra16_rvv(MpegEncContext *c, const uint8_t *s, const uint8_t *dummy, ptrdiff_t stride, int h); int ff_vsad_intra8_rvv(MpegEncContext *c, const uint8_t *s, const uint8_t *dummy, ptrdiff_t stride, int h); +int ff_nsse16_rvv(int multiplier, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); +int nsse16_rvv_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); +int ff_nsse8_rvv(int multiplier, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); +int nsse8_rvv_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) { #if HAVE_RVV @@ -82,6 +91,27 @@ av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) c->vsad[1] = ff_vsad8_rvv; c->vsad[4] = ff_vsad_intra16_rvv; c->vsad[5] = ff_vsad_intra8_rvv; + + c->nsse[0] = nsse16_rvv_wrapper; + c->nsse[1] = nsse8_rvv_wrapper; } #endif } + +int nsse16_rvv_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h) +{ + if (c) + return ff_nsse16_rvv(c->avctx->nsse_weight, s1, s2, stride, h); + else + return ff_nsse16_rvv(8, s1, s2, stride, h); +} + +int nsse8_rvv_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2, + ptrdiff_t stride, int h) +{ + if (c) + return ff_nsse8_rvv(c->avctx->nsse_weight, s1, s2, stride, h); + else + return ff_nsse8_rvv(8, s1, s2, stride, h); +} diff --git a/libavcodec/riscv/me_cmp_rvv.S b/libavcodec/riscv/me_cmp_rvv.S index f32ae6b259..c9ae5bb6fc 100644 --- a/libavcodec/riscv/me_cmp_rvv.S +++ b/libavcodec/riscv/me_cmp_rvv.S @@ -407,3 +407,121 @@ endfunc func ff_vsad_intra8_rvv, zve32x vsad_vsse_intra8 abs endfunc + +func ff_nsse16_rvv, zve32x + .macro squarediff16 + vsetivli zero, 16, e8, m1, tu, ma + vle8.v v4, (a1) + vle8.v v12, (a2) + vwsubu.vv v16, v4, v12 + vsetvli zero, zero, e16, m2, tu, ma + vwmacc.vv v24, v16, v16 + .endm + + .macro gradiff16 srcx srcv + vsetivli zero, 16, e8, m1, tu, ma + vle8.v v8, (\srcx) + vslide1down.vx v0, \srcv, t5 + vslide1down.vx v16, v8, t5 + vwsubu.vv v20, \srcv, v0 + vwsubu.wv v0, v20, v8 + vwaddu.wv v20, v0, v16 + vsetivli zero, 15, e16, m2, tu, ma + vneg.v v0, v20 + vmax.vv v0, v20, v0 + .endm + + csrwi vxrm, 0 + vsetivli t0, 16, e32, m4, ta, ma + addi a4, a4, -1 + li t5, 1 + vmv.v.x v24, zero + vmv.v.x v28, zero +1: + add t1, a1, a3 + add t2, a2, a3 + addi a4, a4, -1 + squarediff16 + gradiff16 t1, v4 + vwaddu.wv v28, v28, v0 + gradiff16 t2, v12 + vwsubu.wv v28, v28, v0 + add a1, a1, a3 + add a2, a2, a3 + bnez a4, 1b + + squarediff16 + vsetivli zero, 16, e32, m4, tu, ma + vmv.s.x v0, zero + vmv.s.x v4, zero + vredsum.vs v0, v24, v0 + vredsum.vs v4, v28, v4 + vmv.x.s t1, v0 + vmv.x.s t2, v4 + srai t3, t2, 31 + xor t2, t3, t2 + sub t2, t2, t3 + mul t2, t2, a0 + add a0, t2, t1 + + ret +endfunc + +func ff_nsse8_rvv, zve32x + .macro squarediff8 + vsetivli zero, 8, e8, mf2, tu, ma + vle8.v v4, (a1) + vle8.v v12, (a2) + vwsubu.vv v16, v4, v12 + vsetvli zero, zero, e16, m1, tu, ma + vwmacc.vv v24, v16, v16 + .endm + + .macro gradiff8 srcx srcv + vsetivli zero, 8, e8, mf2, tu, ma + vle8.v v8, (\srcx) + vslide1down.vx v0, \srcv, t5 + vslide1down.vx v16, v8, t5 + vwsubu.vv v20, \srcv, v0 + vwsubu.wv v0, v20, v8 + vwaddu.wv v20, v0, v16 + vsetivli zero, 7, e16, m1, tu, ma + vneg.v v0, v20 + vmax.vv v0, v20, v0 + .endm + + csrwi vxrm, 0 + vsetivli t0, 8, e32, m2, ta, ma + addi a4, a4, -1 + li t5, 1 + vmv.v.x v24, zero + vmv.v.x v28, zero +1: + add t1, a1, a3 + add t2, a2, a3 + addi a4, a4, -1 + squarediff8 + gradiff8 t1, v4 + vwaddu.wv v28, v28, v0 + gradiff8 t2, v12 + vwsubu.wv v28, v28, v0 + add a1, a1, a3 + add a2, a2, a3 + bnez a4, 1b + + squarediff8 + vsetivli zero, 8, e32, m2, tu, ma + vmv.s.x v0, zero + vmv.s.x v4, zero + vredsum.vs v0, v24, v0 + vredsum.vs v4, v28, v4 + vmv.x.s t1, v0 + vmv.x.s t2, v4 + srai t3, t2, 31 + xor t2, t3, t2 + sub t2, t2, t3 + mul t2, t2, a0 + add a0, t2, t1 + + ret +endfunc -- 2.43.0