From patchwork Mon Jul 22 18:11:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50679 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2182963vqm; Mon, 22 Jul 2024 11:12:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU1GMQ8njjp6RZvY6mP04QZGV9BIwTzYXvddLuzyve5UUMrw/kzm4bmh00MpONQao6lQ/wNrwuuDUV+BlJaga9CY109FGu+H4C+jA== X-Google-Smtp-Source: AGHT+IGBHJepFxrYzCtOXFE2rTDluofe4avnISQFPyMjdcTa1QuUaBQmhtPoY+BiDfUS3hajL0mf X-Received: by 2002:a05:6512:131b:b0:52e:9d2c:1c86 with SMTP id 2adb3069b0e04-52efb631defmr5236864e87.14.1721671934552; Mon, 22 Jul 2024 11:12:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721671934; cv=none; d=google.com; s=arc-20160816; b=yr35epDNPJ79TIHH6HRfMmWfcNtbBp/9Di5YEMaP5Pn3/BCGfwDF/QrHaLM53yiY19 bDWWRK+zpK7/abddsMvKrIkoG4YKA+qyVtJJb/ga4TwK0KnkI+5tPeIXbQxDykjGyGH3 VtCK9btuWIo8eWhxzt3jMu6fJUhYZWOQJFqneMFngoy4nYcCzUWG7tgmNTlkZnkpJbcl xaHVhIgm4J/CcxKNP8rax4ZAc3soryv9buQrBrpRfE1LGp9Yv5eYCycsQDxJzZjAs86H p1qkBSp1OxRYimWqCZx0a7O2tE2bzujdLFne3RaEGy7OQAmqe897GaSL7wrvBHmu9PPd sybg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=3E/vXJZATFWsAVzqI/5S8438EJRKPaAw6ug6tc8/MTw=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=sjPvU2FYis2pK4AYpzDFxKL10usfSeJuU28ifE16y/cpOVaTVESSyU80BTDCWiy+s2 vu9brkWxDfjsPibRZE80YUiUagDlfqy1Qc6YBX+A7FXRWkMsQVLpE4ameHxxenaml/Df c7EHwHBNZit7OPS41ZN4KU1o9cOdSztyNnXUEtD5lwQeLPdopRpU1/O4L/RM3bGgPcnj xzyZ6ELsavHkknX3wZ6Wrw+zon6lJIqWX3/or2bQ0qsPekF8Dlo7uREDEonHDBWaud+j vNs8vTFbCg7TFmzQDtxXohSNBIKYuP6ypPy0EwZlpGytLsoFdEYA9WQXKOslMaRLxEC+ K5xg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a7a8e293d09si3402166b.935.2024.07.22.11.12.13; Mon, 22 Jul 2024 11:12:14 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DFFAA68D67D; Mon, 22 Jul 2024 21:12:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BFE9E68D4C1 for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id EFB58C0069 for ; Mon, 22 Jul 2024 21:12:01 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Jul 2024 21:11:58 +0300 Message-ID: <20240722181201.24563-1-remi@remlab.net> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: restrict vertical intra pointers X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MKpFxZylaLyd This lets the compiler unroll ever so slightly better (at least in the 16x16 case for RISC-V GCC). --- libavcodec/vp9dsp_template.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index 9b11661704..5c4fb5d6e2 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -30,7 +30,7 @@ // FIXME see whether we can merge parts of this (perhaps at least 4x4 and 8x8) // back with h264pred.[ch] -static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_4x4_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -44,7 +44,7 @@ static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride, AV_WN4PA(dst + stride * 3, p4); } -static void vert_8x8_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_8x8_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -61,7 +61,7 @@ static void vert_8x8_c(uint8_t *_dst, ptrdiff_t stride, } } -static void vert_16x16_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_16x16_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -82,7 +82,7 @@ static void vert_16x16_c(uint8_t *_dst, ptrdiff_t stride, } } -static void vert_32x32_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; From patchwork Mon Jul 22 18:11:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50680 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2183056vqm; Mon, 22 Jul 2024 11:12:25 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWP+5Cw3kB64/XW6OUhm2o8LrgvTxrq+8IbwpfG8QZz0VmN29hoMizV53mLCGZ3KhkVSrevhQMEPgq9Dy4CCbDNxUh5ddAF3u0LpQ== X-Google-Smtp-Source: AGHT+IGbdtpE52W+0CFXUhRe83QkQKVxgjG+Ei3ROv0QMpgEOSkrBeU9rUrJ0b+QUCrN1JvYoIIM X-Received: by 2002:a05:6512:3b92:b0:52c:9d31:3f25 with SMTP id 2adb3069b0e04-52efb895b7bmr4167035e87.43.1721671944764; Mon, 22 Jul 2024 11:12:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721671944; cv=none; d=google.com; s=arc-20160816; b=olDWQqM/OdujrMMhJiSJPfLQhSONZ5m8oH3U3Fbhq/ZpzqpXrmCeoR0SR6U6g77rGG wvXN72gtG6q1R9UvVY48/6vAdgTUOwDF3cQryam6AlecnoZWQ4P8bd2eTPLABmHHYp+k mognFohXB/Sa77+1YhoWzEP7F4eoUxmtIiIpQNdyrEe2OzwKy/sOQJC3Hv1pKPuTL1l4 SEyVke+VRsEfFuQEID1scJwnQaeaeRd1EZfv8xuYEROiaOFh5CBcS+1OaImH8a3YP2sd 0n0tFha+5lnugHvTFUQgfanloAwufhMla+WgzmeYw480ZWWwRdLml8MaQBymik8tNf9v 2Dyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=gTF73yyGY533Yk22AkP3shC7zdC37fKHWSh0otiKc70=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=RzuWCajkbfX3DNe3rQ2XpCsnwngtHFgOHfvtCVaI0icO8lKeHgAAxhui6uSPmV2Lvh XKRcyRiacxxDuOIaoiA6+HyaI6GaDy6GGGKwylQLJrKzdr1v1uNKwo86C4O5vnT+h2Xn zZj8LyKMhJ8RmIPWiDV+CRBBOjbde8QTY7eOKmgtDHX5guGXMcAaJV5d6+KrKXYKddiL DmFDBt9T1xUogVBXhurcmiW04Ze+3cFtd357948FSzWBoVL+7vtsZ7bSdmwTTNiZ2wK8 8nmL066pij8ptTxcYGXy5KfIAIuCGuPd1pRRtr+3ENZofK5mWgLbwE8QItDpDp7PJL+m OI7w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52f31fb1a72si593539e87.117.2024.07.22.11.12.23; Mon, 22 Jul 2024 11:12:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 08FC768D6E3; Mon, 22 Jul 2024 21:12:14 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C189C68D67D for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 31E4EC0186 for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Jul 2024 21:11:59 +0300 Message-ID: <20240722181201.24563-2-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240722181201.24563-1-remi@remlab.net> References: <20240722181201.24563-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: use restrict qualifier for copy/avg MC X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7J5zEbRqmd+V Same as previous commit. --- libavcodec/vp9dsp_template.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index 5c4fb5d6e2..da3cc28e5e 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -1936,9 +1936,9 @@ static av_cold void vp9dsp_loopfilter_init(VP9DSPContext *dsp) #if BIT_DEPTH != 12 -static av_always_inline void copy_c(uint8_t *dst, ptrdiff_t dst_stride, - const uint8_t *src, ptrdiff_t src_stride, - int w, int h) +static av_always_inline void copy_c(uint8_t *restrict dst, ptrdiff_t dst_stride, + const uint8_t *restrict src, + ptrdiff_t src_stride, int w, int h) { do { memcpy(dst, src, w * sizeof(pixel)); @@ -1948,9 +1948,9 @@ static av_always_inline void copy_c(uint8_t *dst, ptrdiff_t dst_stride, } while (--h); } -static av_always_inline void avg_c(uint8_t *_dst, ptrdiff_t dst_stride, - const uint8_t *_src, ptrdiff_t src_stride, - int w, int h) +static av_always_inline void avg_c(uint8_t *restrict _dst, ptrdiff_t dst_stride, + const uint8_t *restrict _src, + ptrdiff_t src_stride, int w, int h) { pixel *dst = (pixel *) _dst; const pixel *src = (const pixel *) _src; From patchwork Mon Jul 22 18:12:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50683 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2200854vqm; Mon, 22 Jul 2024 11:51:52 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUQZcjh2W7+3s3VFAmf4G+Dnh6mHNyFbq9ZaY1Uh/DwN2C5GI12s39YSclR7eqJLQW/kiJ3eatARsk5vgFc9lbJRliiiMzA4Do25Q== X-Google-Smtp-Source: AGHT+IG/hMVXvk61GPLJxsWW5xKKeFVQN5tyu4+Muhs0vmwt0A7NpWjmVV2SasnlohlFHShql9aQ X-Received: by 2002:a05:6512:2392:b0:52e:9dee:a6ee with SMTP id 2adb3069b0e04-52efb75c1cfmr5321942e87.26.1721674312432; Mon, 22 Jul 2024 11:51:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721674312; cv=none; d=google.com; s=arc-20160816; b=cQ9lUtwMPB4SmR7ywvAUScnIaIem9e2HcsO5rh3TUOE2hMVey+uVUudyP5D4dZlPAT m9j1g07COMw7WgyXYKu736p+hWaEZlNlnmXiPPOhL1pR6ABW2rRrDXjVY7SZthR3NOEI 0u6mU8PGm+MqrqMfbxXatUoAaKwyQGs1b/LfUqdeYCH0F6cxHDo0YvzYOCga+V9Hcy8n wRz8fttswmADV2CTBIVgVtCegPksnScMgPQxPuguJbZ/L7R/gCDC+2Y9Tc0lW2260IZE nRogxkKIY+SSopN3Y789XNSh0xlx8S6sgjMJlBwUu71awt7F2yAlzX0QOj03ITkvHKOf rsZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=3AEooBBSwaEtLiX690ZcSYlcNvQDfx9mSxo7kWou8yM=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=wlZlF/gHEg1fM3lxuet/2wo4hz6/3d6IU5haGefqZy9OmiF5fI1LIsxDDz97ixjBFt QnuxtzKmNgtiEU3HABawlURJucRgus/iZ8deUkWkvwuaDSXe6bE9ir1twuue3OwM+hDv zjRTN30UEUP7U1ugfhzoUBMqdg/KleWJFD4LMi8rssufE8P7LDwryjsJ9L6jwmQd/t6k oE05iUXU90s2OlcxhPYl3v1slCDnhJt5H/Ih3qIy1ZY3DJDEqmxd2EN5BmX9AWkOR4Ug CCTdf1q39Ht5AogIzK/9/HcyU1FX50KTOc38bBOIMqCnEKXEnQHjeVs1krVPmk5UJxwC 5nMw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52eff21e058si1252653e87.194.2024.07.22.11.51.51; Mon, 22 Jul 2024 11:51:52 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3B70368D6FF; Mon, 22 Jul 2024 21:12:17 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CC20968D695 for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 58405C01F0 for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Jul 2024 21:12:00 +0300 Message-ID: <20240722181201.24563-3-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240722181201.24563-1-remi@remlab.net> References: <20240722181201.24563-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: copy 8 pixels at once X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 08SrNWxsOTC+ In the 8-bit case, we can actually read/write 8 aligned pixel values per load/store, which unsurprisingly tends to be faster on 64-bit systems (and makes no differences on 32-bit systems). This requires ifdef'ing though. --- libavcodec/vp9dsp_template.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index da3cc28e5e..9e5b25142d 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -49,14 +49,22 @@ static void vert_8x8_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 + uint64_t p8 = AV_RN64A(top); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 8; y++) { +#if BIT_DEPTH == 8 + AV_WN64A(dst, p8); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); +#endif dst += stride; } } @@ -66,18 +74,28 @@ static void vert_16x16_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 + uint64_t p8a = AV_RN64A(top); + uint64_t p8b = AV_RN64A(top + 8); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); pixel4 p4c = AV_RN4PA(top + 8); pixel4 p4d = AV_RN4PA(top + 12); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 16; y++) { +#if BIT_DEPTH == 8 + AV_WN64A(dst + 0, p8a); + AV_WN64A(dst + 8, p8b); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); AV_WN4PA(dst + 8, p4c); AV_WN4PA(dst + 12, p4d); +#endif dst += stride; } } @@ -87,6 +105,12 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 + uint64_t p8a = AV_RN64A(top); + uint64_t p8b = AV_RN64A(top + 8); + uint64_t p8c = AV_RN64A(top + 16); + uint64_t p8d = AV_RN64A(top + 24); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); pixel4 p4c = AV_RN4PA(top + 8); @@ -95,10 +119,17 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, pixel4 p4f = AV_RN4PA(top + 20); pixel4 p4g = AV_RN4PA(top + 24); pixel4 p4h = AV_RN4PA(top + 28); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 32; y++) { +#if BIT_DEPTH == 8 + AV_WN64A(dst + 0, p8a); + AV_WN64A(dst + 8, p8b); + AV_WN64A(dst + 16, p8c); + AV_WN64A(dst + 24, p8d); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); AV_WN4PA(dst + 8, p4c); @@ -107,6 +138,7 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, AV_WN4PA(dst + 20, p4f); AV_WN4PA(dst + 24, p4g); AV_WN4PA(dst + 28, p4h); +#endif dst += stride; } } From patchwork Mon Jul 22 18:12:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50681 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a742:0:b0:482:c625:d099 with SMTP id f2csp2187871vqm; Mon, 22 Jul 2024 11:21:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWDoNQYqW9w0xBLfbOGbWEAYU/tjffg9Gf37hMV4nA7lbz4hdrPMDgIa3dt8S0jNsndD04DuTaFSkroQemgB0pfKryPf6ezWDReHw== X-Google-Smtp-Source: AGHT+IGTwbR756INDF0EbsprDULLVXm+Ja/TEjGbtr4cD8Xf/LXrqmUQMhIVq8/cr05Z28UDGCRR X-Received: by 2002:a05:6512:3f07:b0:52e:7278:a39d with SMTP id 2adb3069b0e04-52ef87d82d8mr2159526e87.0.1721672511256; Mon, 22 Jul 2024 11:21:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721672511; cv=none; d=google.com; s=arc-20160816; b=nkdBBDigShszvwxosoLWby2rgVZ+98BPtb4RCfI90FlmzCjjS1DRVMv2qGPdD7pUFZ Y3hy0ZjTcNfEeAsmpzPajSPZpkNVtnROVZ2zKFv2urog7UhesK2AJ5XNwB2CYXSRjgpH XwPoDsACP2X197LQVJ+ZIF/eiy0OkazufpnYz39rpYc4+7axuXEPcRHuPUlS2XU6ab+K b23u8hR8k50GMkWffrD3VHc9QYtVO3MR2aaVleCsQZJOlN323U6ivxKuM03R3C6EviOG jPhUu80dYuFuCrNB4WYaUVgja5GO/q4RMaP27Zil9cIgQEb43/LDkBZjudwUXL+KwFN1 9U4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=t08gdJ7zzCvk2zHWwnzOw2t1WmW0KMzxoYFGQCEC9UA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=mb/pPvGxQhd3EwA1gmFXa3QVtSOECvYWTQGOUsYoQmKvmRWsikvPMXtODakoYi6e7u oGbkvAZPsDMVcD2gpcSp5igEjAN3jNwazstC2MfFX8+eJ6Qc01lBo9v9vpTF7T9/irp7 yp3S9yeTx6RbTV8Pgc9wLKCT2hF1ngQsil5Em692nr6Dgx1bn9v/eTVbGaHaXhaivzb3 gMzZSBka5+nrNHyj5gc1LXP3v66YFKzgH9y5B55DutNagPPJZ0UoussDXmlWkOFalNaJ A9UmJq/FHJKyg3rXEXV1Bw7e1cBP1mwifYrrk0nUbp01xoHJ+LYJxr0AdrUSVekxn71A +/zA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-52efa7128edsi1566985e87.542.2024.07.22.11.21.50; Mon, 22 Jul 2024 11:21:51 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A0C2B68D71D; Mon, 22 Jul 2024 21:12:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D6BAA68D69C for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 7F130C0233 for ; Mon, 22 Jul 2024 21:12:02 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Jul 2024 21:12:01 +0300 Message-ID: <20240722181201.24563-4-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240722181201.24563-1-remi@remlab.net> References: <20240722181201.24563-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/vp9dsp: remove R-V I intra functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HujabWvzKm2a At this point, they are identical to the C code, except for isntruction ordering. In fact, they are typically slower or no faster than the C code. (Also FWIW, they were incorrectly flagged as requiring fast unaligned memory accesses.) --- libavcodec/riscv/Makefile | 3 +- libavcodec/riscv/vp9_intra_rvi.S | 71 -------------------------------- libavcodec/riscv/vp9dsp_init.c | 7 ---- 3 files changed, 1 insertion(+), 80 deletions(-) delete mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 0bbdd38116..a6cdcb71e9 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -73,8 +73,7 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o -RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ - riscv/vp9_mc_rvi.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_mc_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S deleted file mode 100644 index 16b6bdb25a..0000000000 --- a/libavcodec/riscv/vp9_intra_rvi.S +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include "libavutil/riscv/asm.S" - -#if __riscv_xlen >= 64 -func ff_v_32x32_rvi - ld t0, (a3) - ld t1, 8(a3) - ld t2, 16(a3) - ld t3, 24(a3) - .rept 16 - add a7, a0, a1 - sd t0, (a0) - sd t1, 8(a0) - sd t2, 16(a0) - sd t3, 24(a0) - sh1add a0, a1, a0 - sd t0, (a7) - sd t1, 8(a7) - sd t2, 16(a7) - sd t3, 24(a7) - .endr - - ret -endfunc - -func ff_v_16x16_rvi - ld t0, (a3) - ld t1, 8(a3) - .rept 8 - add a7, a0, a1 - sd t0, (a0) - sd t1, 8(a0) - sh1add a0, a1, a0 - sd t0, (a7) - sd t1, 8(a7) - .endr - - ret -endfunc - -func ff_v_8x8_rvi - ld t0, (a3) - .rept 4 - add a7, a0, a1 - sd t0, (a0) - sh1add a0, a1, a0 - sd t0, (a7) - .endr - - ret -endfunc -#endif diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 454dcd963f..2034e1c976 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -74,13 +74,6 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) #if HAVE_RV int flags = av_get_cpu_flags(); -# if __riscv_xlen >= 64 - if (bpp == 8 && (flags & AV_CPU_FLAG_RVB_ADDR)) { - dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; - dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; - dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; - } -# endif #if HAVE_RVV if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_rv_vlen_least(128)) { dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv;