From patchwork Wed May 22 20:28:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 49153 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:542:0:b0:460:55fa:d5ed with SMTP id 63csp624198vqf; Wed, 22 May 2024 13:29:05 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUzITLritL4cV91ckm4SXu2fxeBQ8+f9q76hwuiziBVIuv3bl2kvyD0IGZ/NUfhDxa/1UwrHhqUFc96S1nqdPy0w90a8scZYtmfsg== X-Google-Smtp-Source: AGHT+IEk8+TK77as8J3B0D4gefAy1oKuLbvB3VJg62XAJFGN/maUcc76mrlpDCXXL7ykxntKgJ6N X-Received: by 2002:ac2:4c8c:0:b0:51a:f16d:52a8 with SMTP id 2adb3069b0e04-526c0878837mr1881958e87.50.1716409745625; Wed, 22 May 2024 13:29:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1716409745; cv=none; d=google.com; s=arc-20160816; b=wKJjNHPr3WvKGKrArrElFc5xieMX3/GI5hVkjxpB0K8+V795Vg+8ViXkE+Q+lPn+/U Kh05LZ6mY+8k6NPcOQChGB516MM2f62GsbFW0FinTvUXW9LgpfOoBuOJqNr3UmaoexDt rHbrl9iVYTEv8NmTIxLgrVioXMrhdvabNzQk36k2cxK4nFbdwTn79pvBb066tmavxT9U /muzwMLoHYEIeJZQGMyS1X/j6kEH/urc6RdHVmBUFkShtW2FfZF5iqoFVYAkksuOggyn LXJhKjqw7YR1cS4vgYIxVby5Zneuv8oSea+pAeE3DUAkbXcyrdybIlklEthoNp1Jny7W FWnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=EJM2DY37Sam+z3lXG6OwJj/B4IYB0AYEotJrKHF7q9M=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=LzctO8BT2eaDvfq+rCUqgVMnCWRk/wMHp5Vl3Z3TNqJ3BrWqCH+au9U+IW9VMWrChZ YahNng4Vn9pou2ino4jsOe3ZZN6uiSeFSU+duVrl07k6eU+TXr4ah3InKHJMzqT0OwWj oBrR7mhbJCju3VWJZ9BO2OhZ3chMNz36E/+fOym8+ZYpnw94Qka3v19yVw2FDYr1zqn8 Pu32329zPs2MoEHftsICuYl2N+9Sblw88QCOF4SWhbEmKfsGPj5z0GhMtD91LJkbC1tJ 43oW5L+vGr6ckhKxMrxxkwMs6YFlw9RvrJiUHdjfChJcVswcvFG0BtUfrsm+LTdApOuO 5mRw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a5a17ba4fc7si1550625866b.646.2024.05.22.13.29.05; Wed, 22 May 2024 13:29:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 341AB68D435; Wed, 22 May 2024 23:29:02 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DC6CA68D25A for ; Wed, 22 May 2024 23:28:55 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id E6ABAC0143 for ; Wed, 22 May 2024 23:28:54 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 22 May 2024 23:28:54 +0300 Message-ID: <20240522202854.15461-1-remi@remlab.net> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/rv34dsp: optimise R-V V idct_dc_add X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 6zySFm50Gjg/ This removes one stray LI and reworks the vector arithmetic to avoid changing the vector configuration. On K230, this takes the 46.5 cycle count down from 46.5 to 43.5. --- libavcodec/riscv/rv34dsp_rvv.S | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/libavcodec/riscv/rv34dsp_rvv.S b/libavcodec/riscv/rv34dsp_rvv.S index f1f6345012..e8aff7e570 100644 --- a/libavcodec/riscv/rv34dsp_rvv.S +++ b/libavcodec/riscv/rv34dsp_rvv.S @@ -36,16 +36,15 @@ func ff_rv34_idct_dc_add_rvv, zve32x vsetivli zero, 4, e8, mf4, ta, ma vlse32.v v0, (a0), a1 li t1, 169 + li t2, 128 mul t1, t1, a2 - li a2, 255 + vsetivli zero, 4*4, e8, m1, ta, ma + vwsubu.vx v2, v0, t2 addi t1, t1, 512 srai t1, t1, 10 - vsetivli zero, 4*4, e16, m2, ta, ma - vzext.vf2 v2, v0 - vadd.vx v2, v2, t1 - vmax.vx v2, v2, zero - vsetvli zero, zero, e8, m1, ta, ma - vnclipu.wi v0, v2, 0 + vwadd.wx v2, v2, t1 + vnclip.wi v0, v2, 0 + vxor.vx v0, v0, t2 vsetivli zero, 4, e8, mf4, ta, ma vsse32.v v0, (a0), a1