From patchwork Tue Feb 6 15:56:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: flow gg X-Patchwork-Id: 46082 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:7b08:b0:19e:8a94:b663 with SMTP id s8csp1493432pzh; Tue, 6 Feb 2024 07:56:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IGh4xh7vdhDXg4w0BQmZAr2zDEEnqeRykhM5JgOGCVn9a9D+ZgwA9jv9IXnUconjPxY8bYt X-Received: by 2002:a05:6402:893:b0:55f:d60a:b1ac with SMTP id e19-20020a056402089300b0055fd60ab1acmr2027687edy.16.1707234999954; Tue, 06 Feb 2024 07:56:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707234999; cv=none; d=google.com; s=arc-20160816; b=z3N0w481fjo1fcUL3z77JXEquYbdYJlB9/ykvTXkluZdyzlbVqrkwWzpbnvgWuwvli BhPdt4MO8OusO+rmjRqIFk/nUDcVkeSeVLa2s0kqBMc1s0YCrGBnU0MKcdGbadl3UBbH hg/vpA2M8k+hG5VtDSR48Wu1ou/aSQEUh5n1nVqBPit7oqyb1PPsGP10koYrRg58UFlb hgOaHKb54JrPtQS7PPbpNcYjiojTyu7mMR3JdSd9uNIWdktIWG3MFjppC1LReY5PKFiq UAtJHFu/03FDWya9ZatjwXyNX3AEP5/118xcYQmFyQ6Col7Lu9t6te7vb+wYgoLn/WsP 5UjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:subject:to :message-id:date:from:mime-version:dkim-signature:delivered-to; bh=koC442YmbcjxsAGsr8qyiGzdXPRds7fXh/jziFvP/mk=; fh=1N30gCb1stadVa6+RsyuEqe3ByDqjvsvIK2MUQpeuhw=; b=dh2Okzmz11TvFQQDEk/Zc/BiGrxg2dGDBf/WDTpVJDDNEUgrafbmRz8U4wKXOZeJ4h Dh2/cw7X18/Yrw8+ykeOFo37XE8eRfwVf/Gkrr5FM/BeUp07+KGrinVMgGnX8qaI+HTf ODbiTD+ROPlaI34KkcvgetWJwD1DTOTLLBb2HID9aBUmNvVkSk7siY7mIzsp3KeDD3yy hcoxr6YnzCWarbwRnoBpzESh+EmWds0PZpM99KrXDCI7iJjkA9LjS5s+Y5pmG71ZR4vn FiFxdpBGA6OPfvnY9R6UPKVPPoaUHW+mQ8BpSzhKuQKSqM+p19+WPpEZgPOC013lxVie KfOw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="XOR/9O9Q"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com X-Forwarded-Encrypted: i=0; AJvYcCW12QJwVrHreTmL+BoUIC1gNb3aJj5ZfGipsCSktASlXwQK/5j3VOAP5gSRyBYFPjP2RwPbrdlifNieganu5PVUYPUBkUXfK322VqBSX7wTV53sj60XaddaUQ+MXU3yC2yMsYC298KIKM+hChi1vki2yXFskmyp7nl5f2d6ikzVFvMJFWNkIjiMWnQjZ4u+u2e2u+c8x2P/3afievhvYz1IIu5AsWx6AhS/lekyw0lFH5MHqOf4LCw60VTPwBpI2DMpN4S+2Bu7T2VYzUSDnBLzL1c9CdqqVm7qkQbZOAYfkrF17UWmdhyyiWBdwObawqXjbCy5H5FLzCWeJkCCfAcnNVIedD9Z/3Y+ALuSPD9a9CTlM++aJj5PuSevB9CXDjMKFQtflYI+LpA7IqgaG9ZaUbjKOgCgUp3N3MK9FG65dX+vyDplxciI+7lo6u+i7ijqm7bEYM8j+Yh+tdCP9nUoSyKoYJldzw07dF+xqAJ33+9HaVdgF5NiO9tl+ZEbqp0EdVejKX40kqCmmEIoD72OFvE434+ITKKbTEtutRw5IqcwihiW+sfI0MJpqbCtONEzXcMkxNQ0fZ/ck6D1oOPeMmNJSwsVCAduurcCIBIfOTHLMyRk5srDUNawyZ9PdexnLwKCPcQHB94JSUOZqVx38zP8lDpamF+lLcP4Z0fXyX5e0/VzYSC5FB4ZVmyTJeMCAcfGBqVoSJPHltddvDlVMZS1mGHAuW3u17FVxP/lpOLRc+Ede/xO3POVOfXXmgKXIxi4jC2YzxmrBRmFikZdO5fb3n7E8+3MpE8hS9HA+WDZUkR6MxKABAjvNVQAW3v1Pwqic9xnITMhpJh3rN9CvcbTF7fW8clv5D0NyUnCYsCt8V6GEaSTo4aFF9/41XVXMbkoJMBR7/RTtUVlXbEM52WDVPi2pDPE7MgPZOIfNwP8IAilOUCcqfMIh4jFWh9brh 6y3pUcvEyqX8gZZPtuCz2cHUSPL1NJ8kLyPpiWvdWBYW5lYrYvf6Gb/6KLYokgHZBOTnfMMkZ7qJlhYep4irpUQS5vqMvSk6U50yPbE+6iw31WhrfIA/Kvkmhp3sVzMUEpRCNUQh7qdqR8dauo6BTpI8ZhkAWYRZ4LHDFaKTaQhY1SPDk89iEjOp/MVgjHgmcgG1TZ4o0k/f7SJxFRPgoxkvMhtTworp94V0qtIJl+SDt38jxYHCN+9QV4r1FbVzubah05OamgJRw+ID267ie0gNpMqVLpuH8r8C0xXZjISg2glpVco7/4jVG6l0Q9BjFFHMN7zC1PVQsEh6safUnCDNf/qNyOz1QOif+bMCgE0B1Nt6ockKp/gXhZ+cw7gnrTwjWL4Sx37NLg5Yb/opRoKqaAMFU3LRY7oUSXr8DGEFktSI0xb/sEroypnrm2AU4mkX51qpfLcCiyYjylkoi+HyNXQoZiQK2nfme0HhEXdQJlwvqzRyEdsBH0neFRFvpFdxAmuCXLbcaK5+Z/3zuY+8L3mIY/CvOj51hn4LjWi+auvzTboibpsvCa4gNmoyiwwnyx/2TkcNqAixIi0CO5TayETeMvCVEy5mZVa4cHkVrdKGie2snfij+Xd+TOkeOS8aI+ Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r19-20020aa7c153000000b0055fc1a01f3dsi1200326edp.662.2024.02.06.07.56.39; Tue, 06 Feb 2024 07:56:39 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b="XOR/9O9Q"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4924568D085; Tue, 6 Feb 2024 17:56:37 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A356368CDBE for ; Tue, 6 Feb 2024 17:56:31 +0200 (EET) Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-7d60ee03b54so1857066241.2 for ; Tue, 06 Feb 2024 07:56:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707234990; x=1707839790; darn=ffmpeg.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=gpOEl4Lrf42knKmx8G/LjUTZWVlKQKv+lYDJkO6O6xE=; b=XOR/9O9Qk6VmMyRTZwfgO0SlTvQGUk5+NgfVcwZLTQlLbmzOiulDbKj4my7ujlMfiZ tTZUcbqo4hyt70V8IDrPrR26psnwaAtO5brqORA1ITHHHxXNsS976b65xZMJCws0HPhR D9Qza7kQyKLelGcAC932kBsPSw+PkYqd4Nmsaa0IKxySLPj+DSaW1OSacaElE/QnOrMr z/IGvvn2uMMH9LWTawP+qVuGi7ekFaBABYdg5V8IntHwt65463R+mfpejCpFX/nVvmNa jis7ptP1ooIiG0kvmG6DFcNxdD6kp9P187oIqJI5Ud/xnIGsrwD8GucbdXS3EXrkuwNL 4Xsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707234990; x=1707839790; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gpOEl4Lrf42knKmx8G/LjUTZWVlKQKv+lYDJkO6O6xE=; b=hAoKLoY6DyYq3tFdcDTSAdCO5dZKgKPTGnfYusjrs5u0UKWPDgNp+r3vikUh6V5z2F 00/QFMhkDri25oXyp5MRlG8N++5SuFVxgPdpsx9IqiI5IpDilBVQGaU+I0R4JCl1axKF goQaJAVofhWZNjp0DniVsPuZY+78l7Np4fsdaQ8pzM9kvfmN+CaZR5b94HmVYamt1MqZ FfS17aanM8I59Y5VKuwO4ISuiYuRLT0uJscWVCj3W8BGM3q2r235xRhi1bVHsrDPkHco XDcB9Xdyqp85nn89eCT0VKnqfAK+esE9Zy9AWhbrH8cX5wFrHiZc+5zdRQdPY9fmd2r4 qQlA== X-Gm-Message-State: AOJu0Ywo6vKyNHIdyrKJYL3mahCE4EkytL0Wo845Dt2ZDQslSNCXhIHf 8JKsr4DdbyykVpNgnknLEW+VENbhUC48kVmk2UXiRh7HQZioYkNAhvPuS+68rMfpgg4FBXjsfW1 wrz93gJJqDILe4Rv6R5lR3bhN5/xSXF8L X-Received: by 2002:a67:fd09:0:b0:46b:303d:471a with SMTP id f9-20020a67fd09000000b0046b303d471amr3578347vsr.31.1707234989815; Tue, 06 Feb 2024 07:56:29 -0800 (PST) MIME-Version: 1.0 From: flow gg Date: Tue, 6 Feb 2024 23:56:18 +0800 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH 4/7] lavc/me_cmp: R-V V sse X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: vFwyjR874kuf From 7d153e6b166d53c94db57be4f024986d38290042 Mon Sep 17 00:00:00 2001 From: sunyuechi Date: Tue, 6 Feb 2024 21:55:07 +0800 Subject: [PATCH 4/7] lavc/me_cmp: R-V V sse C908: sse_0_c: 614.7 sse_0_rvv_i32: 138.2 sse_1_c: 302.7 sse_1_rvv_i32: 107.2 sse_2_c: 175.7 sse_2_rvv_i32: 104.2 --- libavcodec/riscv/me_cmp_init.c | 11 ++++++ libavcodec/riscv/me_cmp_rvv.S | 66 ++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+) diff --git a/libavcodec/riscv/me_cmp_init.c b/libavcodec/riscv/me_cmp_init.c index 72c3248b01..85ecc22cbc 100644 --- a/libavcodec/riscv/me_cmp_init.c +++ b/libavcodec/riscv/me_cmp_init.c @@ -39,6 +39,13 @@ int ff_pix_abs16_y2_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *p int ff_pix_abs8_y2_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int ff_sse16_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); +int ff_sse8_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); +int ff_sse4_rvv(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) { #if HAVE_RVV @@ -53,6 +60,10 @@ av_cold void ff_me_cmp_init_riscv(MECmpContext *c, AVCodecContext *avctx) c->pix_abs[1][1] = ff_pix_abs8_x2_rvv; c->pix_abs[0][2] = ff_pix_abs16_y2_rvv; c->pix_abs[1][2] = ff_pix_abs8_y2_rvv; + + c->sse[0] = ff_sse16_rvv; + c->sse[1] = ff_sse8_rvv; + c->sse[2] = ff_sse4_rvv; } #endif } diff --git a/libavcodec/riscv/me_cmp_rvv.S b/libavcodec/riscv/me_cmp_rvv.S index 308d707136..11848f3f21 100644 --- a/libavcodec/riscv/me_cmp_rvv.S +++ b/libavcodec/riscv/me_cmp_rvv.S @@ -165,3 +165,69 @@ func ff_pix_abs8_y2_rvv, zve32x pix_abs_ret endfunc + +func ff_sse16_rvv, zve32x + vsetivli t0, 16, e32, m4, ta, ma + vmv.v.x v24, zero + vmv.s.x v0, zero +1: + vsetvli zero, zero, e8, m1, tu, ma + vle8.v v4, (a1) + vle8.v v12, (a2) + addi a4, a4, -1 + vwsubu.vv v16, v4, v12 + vsetvli zero, zero, e16, m2, tu, ma + vwmacc.vv v24, v16, v16 + add a1, a1, a3 + add a2, a2, a3 + bnez a4, 1b + + vsetvli zero, zero, e32, m4, tu, ma + vredsum.vs v0, v24, v0 + vmv.x.s a0, v0 + ret +endfunc + +func ff_sse8_rvv, zve32x + vsetivli t0, 8, e32, m2, ta, ma + vmv.v.x v24, zero + vmv.s.x v0, zero +1: + vsetvli zero, zero, e8, mf2, tu, ma + vle8.v v4, (a1) + vle8.v v12, (a2) + addi a4, a4, -1 + vwsubu.vv v16, v4, v12 + vsetvli zero, zero, e16, m1, tu, ma + vwmacc.vv v24, v16, v16 + add a1, a1, a3 + add a2, a2, a3 + bnez a4, 1b + + vsetvli zero, zero, e32, m2, tu, ma + vredsum.vs v0, v24, v0 + vmv.x.s a0, v0 + ret +endfunc + +func ff_sse4_rvv, zve32x + vsetivli t0, 4, e32, m1, ta, ma + vmv.v.x v24, zero + vmv.s.x v0, zero +1: + vsetvli zero, zero, e8, mf4, tu, ma + vle8.v v4, (a1) + vle8.v v12, (a2) + addi a4, a4, -1 + vwsubu.vv v16, v4, v12 + vsetvli zero, zero, e16, mf2, tu, ma + vwmacc.vv v24, v16, v16 + add a1, a1, a3 + add a2, a2, a3 + bnez a4, 1b + + vsetvli zero, zero, e32, m1, tu, ma + vredsum.vs v0, v24, v0 + vmv.x.s a0, v0 + ret +endfunc -- 2.43.0