From patchwork Thu May 30 15:26:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: uk7b@foxmail.com X-Patchwork-Id: 49401 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:5c1:0:b0:460:55fa:d5ed with SMTP id 184csp395892vqf; Thu, 30 May 2024 08:27:28 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUTYdNmBCaSRNWBEsnGV2F2rsUMf3DKzlIitq3UgeXcL96g8ASFoN7i21jmKppE0aBoK87jLaWfveHt2kTvb3FZZRftWTQBrBTDkQ== X-Google-Smtp-Source: AGHT+IHipVMyQYWIQbg3vxjY7Yt4TI4hFTVWhlm09FRFl9BpsfNb5A+pXxNOTIWeZBAcG3XOq3wB X-Received: by 2002:a19:6b02:0:b0:52b:51ad:13f with SMTP id 2adb3069b0e04-52b7d47abc5mr1604817e87.49.1717082848263; Thu, 30 May 2024 08:27:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1717082848; cv=none; d=google.com; s=arc-20160816; b=act8Gpz7U5JKRxzzO0uvDxJOThu9kWcgfPnnPTbn/Fvq3cZr2EKCo+tQRmVPcFIM5e jHITFzRil+jY3rUhF4Qo6eK6P2SrqJkjhl54gMqKxwAoBKVjIUX6nXG84vuTVDvHw8hZ fLIwi33HioPU+asnHVOqcY8aUj2TZKpGmd8dnEPHTRb3PzKfC4We4WhDil/KGIVUXaLy aeqjyq2JafS9qbOit6f1+a3kXki8mKwuYWrDw7hiHyve+4TkcvUq14FayBFTZ2Os4BHh VZ40DLcgbFCnAgi6NmWWc8nyJuCU0oSgZyFq6ViA1aCg4FRNPXyHgE78wWC3JmChOkrh E4xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=ROtMx54Q1FFmHP9ye4zoZfPYzIel5gfTPbtSjao48xk=; fh=D0bFwGkf4X22/D/bfeDVrXKIx7S6kcXsNzy10j8ORbQ=; b=hIFx5MR+Az4wTk8RMTMhx1EFLRtBH26ly0UnNCGAswyCRC/mwauH2vjm9KfaGHVbcD XyoTfs2n1ynemI6C7S9MGq7sxAyUSCza76+qjn1ZJhYDPJ9ouMcn5s/e31IilB3RgU7I 8AzMrulRh0XTuN/O3wLR3GVB0EZ1XveRKyURwl0bvs+knFfX4x9K48tBr/C9IFyMRUXd o0KJcQ8RF0tbH7xZqbOBCSq1jCax7eEgqd/tu99LhYXE/vd0qcGS6Y0P+u/9ZBye2hni y9CY5058mqF5skpV5ShfeqBN+9BF7SP30DNARN0ZUe0M9z0JtTm6Q7md5HcHy5Rpy6Fz e0kg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=unZq9rDi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a626cc36e6bsi770364566b.375.2024.05.30.08.27.22; Thu, 30 May 2024 08:27:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=unZq9rDi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 35C0868CFCA; Thu, 30 May 2024 18:27:18 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-242.mail.qq.com (out203-205-221-242.mail.qq.com [203.205.221.242]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D69AB68CF80 for ; Thu, 30 May 2024 18:27:09 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1717082817; bh=3vgDxPt/3BeuwWrKQmfz7dyk5qJeT8oT43HEt66bLEc=; h=From:To:Cc:Subject:Date; b=unZq9rDiPPtXA7uWuwCabwJ71Pv8GtnN3it8cSLDNta+Ph51ghdIVAgKaCKXEFZ4N Chg8QlgSuh0d/IdLCuXjVpCNyEQLiY4P04ikFn3+kjRcLoxeVOAq8lroCYVHu8teAd cuvjlWOtDzl5gWxOXdN0u9fcXLUrpb4IcEdSKriM= Received: from localhost.localdomain ([42.86.114.86]) by newxmesmtplogicsvrszc13-0.qq.com (NewEsmtp) with SMTP id 6B7ADE61; Thu, 30 May 2024 23:26:55 +0800 X-QQ-mid: xmsmtpt1717082815tjm24kwwc Message-ID: X-QQ-XMAILINFO: N89bjyf9tBCOXRvVHwNnxOvNlp6mfoZdhvJWOt49txkPufWUT0hG1J8QJpdaDb 9R3komy90xvVCnOQnOdHU9/8EbpABO4Et7eNiyuQTiqKLQVgIp02oFD7DQQWhrFN0ggKlGY8VpE4 ZVSMJ6mFlukrwtj+4nw/735ms642KRDCTxXmCVEna7T7YRMKnBhU4QdTKTkSVNPhbA9jQ88GaJuG 4kDdHz6SRWSBncLyqmwsKWIP6CdYRmOJ03TUxw8E/je9R88CSqmf8+Fv90wECsBYon371e2K3xZ+ MpDhV3qZoiZDVP/Vfg2nnSdPzyQmY1HvBQKx1BUxLv6EAmytseT8H882ghbg48OsunlQZu7kOVgS uVKVTd4KEd/cneeoIK5eVV9AJ5ynKVR5zGvtIGBwp+M2pqdBetlb2v0OCUoMcyXvpPxl/2fNKWQZ 6h4w16RmYarxIxEslLrYu4O1IojPBw6M2ZioD5diGOXAnDByufjXhGLzNX4uwV/IvyHGqu2xQgE9 zigyP6vYcu74zIW/EquCySn0Zfv7YihEaP4/1YxVRaAThxTgsmh9oZj74Desjhln2civsBFv9Lau P1UuQrrq+AG8cjiZzfwQN7W4isw3X4L+j7tIg2ehsjigTbd6sIKF880txFd/bu7qbwa2PD3G/8hv uS96sebTgpkJ2OTXit4kCFWdlz6HJhkyIIAzBjd8mE34NSZieCAJc0LFWK4+XXMczEpm5MZuDeXM GTKxr5KpRjd+rQY1DiVxJ6jCKZfITpieIPYT0XI0qZis3yZk6V8FR28xk9OeFyNfKAqX47DJdTOM t5KspN6E5t9eutX1K/84lPw4u6syuabrqrabz7aWml3v8ue6UzNMimClhwzxU2vKKqMNN5RaLxl8 M+GjB822Odp74btHWGVRFshmkSPCp9Ldphrz02s7jBzBWoMCsqLAs= X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: uk7b@foxmail.com To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 May 2024 23:26:53 +0800 X-OQ-MSGID: <20240530152653.2304943-1-uk7b@foxmail.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: sunyuechi Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 0y+thgd2Bbiy From: sunyuechi Since len < 64, the registers are sufficient, so it can be directly unrolled (a4 is even). Another benefit of unrolling is that it reduces one load operation vertically compared to horizontally. old new C908 X60 C908 X60 vp8_put_bilin4_h_c : 6.2 5.5 : 6.2 5.5 vp8_put_bilin4_h_rvv_i32 : 2.2 2.0 : 1.5 1.5 vp8_put_bilin4_v_c : 6.5 5.7 : 6.2 5.7 vp8_put_bilin4_v_rvv_i32 : 2.2 2.0 : 1.2 1.5 vp8_put_bilin8_h_c : 24.2 21.5 : 24.2 21.5 vp8_put_bilin8_h_rvv_i32 : 5.2 4.7 : 3.5 3.5 vp8_put_bilin8_v_c : 24.5 21.7 : 24.5 21.7 vp8_put_bilin8_v_rvv_i32 : 5.2 4.7 : 3.5 3.2 vp8_put_bilin16_h_c : 48.0 42.7 : 48.0 42.7 vp8_put_bilin16_h_rvv_i32 : 5.7 5.0 : 5.2 4.5 vp8_put_bilin16_v_c : 48.2 43.0 : 48.2 42.7 vp8_put_bilin16_v_rvv_i32 : 5.7 5.2 : 4.5 4.2 --- libavcodec/riscv/vp8dsp_rvv.S | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S index 3360a38cac..5bea6cba9c 100644 --- a/libavcodec/riscv/vp8dsp_rvv.S +++ b/libavcodec/riscv/vp8dsp_rvv.S @@ -172,11 +172,35 @@ func ff_put_vp8_bilin4_\type\()_rvv, zve32x li t4, 4 sub t1, t1, \mn 1: - addi a4, a4, -1 - bilin_load v0, \type, \mn - vse8.v v0, (a0) - add a2, a2, a3 - add a0, a0, a1 + add t0, a2, a3 + add t2, a0, a1 + addi a4, a4, -2 +.ifc \type,v + add t3, t0, a3 +.else + addi t5, a2, 1 + addi t3, t0, 1 + vle8.v v2, (t5) +.endif + vle8.v v0, (a2) + vle8.v v4, (t0) + vle8.v v6, (t3) + vwmulu.vx v28, v0, t1 + vwmulu.vx v26, v4, t1 +.ifc \type,v + vwmaccu.vx v28, \mn, v4 +.else + vwmaccu.vx v28, \mn, v2 +.endif + vwmaccu.vx v26, \mn, v6 + vwaddu.wx v24, v28, t4 + vwaddu.wx v22, v26, t4 + vnsra.wi v30, v24, 3 + vnsra.wi v0, v22, 3 + vse8.v v30, (a0) + vse8.v v0, (t2) + add a2, t0, a3 + add a0, t2, a1 bnez a4, 1b ret