From patchwork Tue Jun 11 14:55:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49801 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:613:0:b0:460:55fa:d5ed with SMTP id 19csp459953vqg; Tue, 11 Jun 2024 07:55:23 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUlv1OT56lsD0HRJaSvvCemhqELiBVGoTa79oJr/8h0ya6QqIeWvLy6jORopoykypB7kSXB+yx+avQRnMkq3Tt7llmOk+f1wGQTxA== X-Google-Smtp-Source: AGHT+IE0wKmgiXyTGB1UdKFQoQ11TxmBAFwz7UWajoFKsdpCOqixCmx9UdXNhyApBbbdBpt8T9kU X-Received: by 2002:a05:651c:4c6:b0:2ea:8125:604 with SMTP id 38308e7fff4ca-2eadce20941mr96646311fa.4.1718117723247; Tue, 11 Jun 2024 07:55:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1718117723; cv=none; d=google.com; s=arc-20160816; b=ybGpSLOrJAqbPL03vzJrUqb9ntm0nJTyRyQcbquEdsxFim046PQrHHMdE06VZwunyj fzaUOMRn355b13kdlTxC3tVcyHbfoaW0uguHClq6vz98VYftDjLL2hZ/l2T4Tc9yY6Dd 0VkX+LCillhl4zzbZdk1XDpblLZJ22ryQZqS+YJO5O/Vz7+sHSCoDqKO38mlkao5Pq6p JffvNCN7djdXR3rr/426+mFT0Bi/U94ZGMKkTZ1bdFjEibNGkTHmw88zfCgtmYfmms7W ap48EDReKeNsHeylMjLA9rc/fBgR/ZNummmT4ncv2ZQw471ZjOKA3kjUI9+Fjxm0bppV Ad1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=u/BxxsUk3JD8kwkYkHtJSDIid0W2i1TJSfGSU7o/w28=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=cIqhWnO0L5b62vQx5LN9KmK0SjS5Y4U+MOnftdFHYs32GfBQZxZjzGKUYb9pmGrIiL UIpQf0zTxEaQT5K2EaSXgi616Vxd1g//HXtzUR5H+LQ5Rlahy6gf64KC5SYtRiDSourQ +ydA+RWcAP4paJxZs9wxEa3buC6YmH1debPJm37eX5dfLuSCvo2/yKTcQmix1L7xJdOK lZSmzRlZS/dIUfZWalesPiAfWpRS5MFv3AjpvnHfxWU+WsIFV48rKulrNsXZTbr5L4xj p65w5gTWYmpMI3Y1uFGhvMVz8kK3/YnckZGSoKj8UxNHOu6f6sb7eT8C1sZ99tFrAUzv oYPg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-57c7fd892f0si2468071a12.15.2024.06.11.07.55.15; Tue, 11 Jun 2024 07:55:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6EB7B68D85A; Tue, 11 Jun 2024 17:55:12 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B97CB68D777 for ; Tue, 11 Jun 2024 17:55:05 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 2BBCBC00A2 for ; Tue, 11 Jun 2024 17:55:05 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 11 Jun 2024 17:55:03 +0300 Message-ID: <20240611145505.14934-1-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <4929107.rjQTLOYEdK@basile.remlab.net> References: <4929107.rjQTLOYEdK@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/vc1dsp: match C block content in inv_trans_8x4_rvv X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: +jJ1HRSm4oqA This shifts the mid-point (after horizontal, before vertical) block state of the transform to match the C code. This forces shifting 8 vectors of 4 elements instead of 4 vectors of 8 elements and is thus slight slower. --- libavcodec/riscv/vc1dsp_rvv.S | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S index 4b7ab33307..7e1fb84b0c 100644 --- a/libavcodec/riscv/vc1dsp_rvv.S +++ b/libavcodec/riscv/vc1dsp_rvv.S @@ -257,6 +257,9 @@ func ff_vc1_inv_trans_8x4_rvv, zve32x vsetivli zero, 4, e16, mf2, ta, ma vlseg8e16.v v0, (a2) jal t0, ff_vc1_inv_trans_8_rvv + .irp n,0,1,2,3,4,5,6,7 + vssra.vi v\n, v\n, 3 + .endr vsseg8e16.v v0, (a2) addi a3, a2, 1 * 8 * 2 vsetivli zero, 8, e16, m1, ta, ma @@ -266,10 +269,6 @@ func ff_vc1_inv_trans_8x4_rvv, zve32x addi a5, a2, 3 * 8 * 2 vle16.v v2, (a4) vle16.v v3, (a5) - .irp n,0,1,2,3 - # shift 4 vectors of 8 elems after transpose instead of 8 of 4 - vssra.vi v\n, v\n, 3 - .endr li t1, 7 jal t0, ff_vc1_inv_trans_4_rvv add a3, a1, a0 From patchwork Tue Jun 11 14:55:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49802 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:613:0:b0:460:55fa:d5ed with SMTP id 19csp459968vqg; Tue, 11 Jun 2024 07:55:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUzpXKxAP1iiSwjWrtHsYjsFOefrZettnEhKdXZRynddCSO2fZfLLV0ubkuEpvZkzaunVuYfZrlMdPs2+0dD2YKB1yTGFtOAElhmQ== X-Google-Smtp-Source: AGHT+IEulHuv/TgyIxV2xyhbNfq2DiRW1hncTrbshVTUjuiogMwhd/lq6dK/DySAoPXBFfyfHxdj X-Received: by 2002:a17:906:b208:b0:a6f:65d:aee0 with SMTP id a640c23a62f3a-a6f34c57a09mr201715766b.6.1718117724444; Tue, 11 Jun 2024 07:55:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1718117724; cv=none; d=google.com; s=arc-20160816; b=kqTP5hUkMyLWO8WgWhO1+CckxjSbb8V/xTiOzhpIRYhG/QcmIRYJ01+cjnbZ2tbL0G WLjIx0UKrXVUstOZNBXakHuAJEH2V5LKkk87ALnj+HzEYjB+LUTs4v45KIPyXrlYPORT plj05hLJU4xhFXxHTUloJq0ynCfZYIrdJ7yehvnheEjm7MA22wr/+W0RRbumEVyc22hW PqzP9R0ib+vu0SghVNHJ+5ehiw906eRwSocOrL/FsGctmB0qA5jAdtoGn3KKzMgpUzpi DF/nYL0C2L1uDcTvXnQSoNYAgkwhmOBOdm4AsTzX37QBK4Owx/dH3t3qZZZQ+X0s8jfz uXZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=eNy9Qavw/16s3k8Yg4JcLhqEYGMDlddIz4W4ItMPlco=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=WQ3w44Ia5R+KNDKbThQsnSIWjF9rgjOdwcnalx4I0MW7SNh1Z8h4Wg4Fhf61vJCMXv LT64G8pNGXBBaAg1c5bTihDcoeZ+vUuWewyt/G9qZ2Fntgfix+WlP8l5vQwIoxH4x7Xj nZm08ClXwZMNREq1D45A2futwdk0+ZgoVG+qChGis1PxFD6ETBI3BMvM6Nk4SgZBSUlj EItigJLoyb55ycV7wVj3zDa6Mscy9vPWbmvLbO7znEVxrrXP5brIQGstjWwnMbnL5hdu zr1gKkCqkqucNJvynvPX/AulgFS2/quIwdIYvrCognwqgNhl6R9zQJ7izXmDAV5iiTtZ H2aA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a6f188f44fasi243999766b.1025.2024.06.11.07.55.24; Tue, 11 Jun 2024 07:55:24 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8659C68D777; Tue, 11 Jun 2024 17:55:13 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 35A0E68D777 for ; Tue, 11 Jun 2024 17:55:06 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 5B1B3C0172 for ; Tue, 11 Jun 2024 17:55:05 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 11 Jun 2024 17:55:04 +0300 Message-ID: <20240611145505.14934-2-remi@remlab.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <4929107.rjQTLOYEdK@basile.remlab.net> References: <4929107.rjQTLOYEdK@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] checkasm/vc1dsp: check the not-in-place block content X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 1LutF3d9ru6Y This seems to cause issues in FATE for 4x4 and 4x8 transforms. But then again, FATE does not seem to care in the 8x4 case. Note that AArch64 NEON code is known to fail this test. --- tests/checkasm/vc1dsp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index f18f0f8251..2cc6785a0c 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@ -317,11 +317,13 @@ static void check_inv_trans_adding(void) for (int j = 0; j < tests[t].height; ++j) for (int i = 0; i < tests[t].width; ++i) { int idx = j * 8 + i; - inv_trans_in1[idx] = inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; + inv_trans_in0[idx] = coeffs->d[j * tests[t].width + i]; } + memcpy(inv_trans_in1, inv_trans_in0, 8 * 8 * 2); call_ref(inv_trans_out0 + 24 + 8, 24, inv_trans_in0); call_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1); - if (memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) + if (memcmp(inv_trans_in0, inv_trans_in1, 8 * 8 * 2) || + memcmp(inv_trans_out0, inv_trans_out1, 10 * 24)) fail(); bench_new(inv_trans_out1 + 24 + 8, 24, inv_trans_in1 + 8); av_free(coeffs);