From patchwork Mon Jul 1 17:08:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50261 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc64:0:b0:482:c625:d099 with SMTP id k4csp1953522vqv; Mon, 1 Jul 2024 10:17:51 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXKjKI5s0hcAeCbuOB8UNc/D80ss73fUcgMZQXxTVOe1/N/8Vl21TdXU8NorwWooaGCEqNKTBDIOy9ZahGz442qtjVs/6erNGzO7A== X-Google-Smtp-Source: AGHT+IGSLmFNGcHkpSs8axnwZMbZW+WS3XUXHXyUDhi0XQE9qdrNv580xFMqN/Bbm9g+lGZV+CdT X-Received: by 2002:a2e:b888:0:b0:2ec:57c7:c740 with SMTP id 38308e7fff4ca-2ee5e6bc38amr41624891fa.39.1719854270362; Mon, 01 Jul 2024 10:17:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1719854270; cv=none; d=google.com; s=arc-20160816; b=t26TUW/h8urncGHUQWhuWRRvD7g2pfv0zozQ3rnJYSGf3ui1sw/Zuzcrne+zxs/ArL IU/4Kz/alfgHCxfxwCivrATEogmMY5SLZUnqV/aZRZ4v4u/z3YAKXZMJZG9wI3839fpy dDI+b4/1hQedsEza8481dxfEcsgTZKxeYVXBYOQpu5gVwbEiO3Iq2Mmk+0KdgvZY3bex jYmt1HpIsdyRqtBfXcLbTIlMRSOnbkl0seN8sikNAMHb+GfVsI4Nb8klrQ85uR3ScNS9 35jygD5JlXdE4NEt2cNn+yopISswt+K+XIED4Gedf07g0K384OyUL/YHs3SK8IkK+Lxg ij5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=EXr2NDPNJdegZC66J+88oxFDEhTosuw7sQD9CXorY2s=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=zdayNj5l02045SqepSfoj6R//QVXxOIinR6kUYxW7fDYW1Xz/2ewmBJYeqlYgVqilc i1W8v6GWIn2zNJeqqIP4xCVNGdtxIVdXaUpWDKtuhbWHqUtN5o/DYUSmMtLVeItltXN8 d4JnrTvYo+C5fNLoKCOlnjIdptTyO2m1QsL+k8nt+v48laDKR/qsY5M828zBl5rCys0K ftkptE9D8yRd+LkJ6SYGElbv9jOmtCpb9sgMrOt5kyDmA8YP/dz57NfeiBzMOcxAnrc8 ZcWmRjNkvm7nXEL4Pi0C2Y0NoCCsP7W5kjWenXC8ibPzwJCmCbIMhi51XCK5O95uTaCy AN8Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2ee51584037si20157821fa.648.2024.07.01.10.17.49; Mon, 01 Jul 2024 10:17:50 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 201FB68D820; Mon, 1 Jul 2024 20:08:18 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AA3E768D7F5 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 2C693C02F9 for ; Mon, 1 Jul 2024 20:08:08 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 1 Jul 2024 20:08:07 +0300 Message-ID: <20240701170807.107018-4-remi@remlab.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240701170807.107018-1-remi@remlab.net> References: <20240701170807.107018-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/4] lavc/h264dsp: update R-V V intra luma loop filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: khg9/8UThhrc Note that the performance reported by checkasm is slightly worse. This is expected since the assembler is now doing more work. --- libavcodec/riscv/h264dsp_init.c | 3 ++- libavcodec/riscv/h264dsp_rvv.S | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index ab412a9924..9650cae66b 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -30,7 +30,8 @@ void ff_h264_v_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); void ff_h264_h_loop_filter_luma_8_rvv(uint8_t *pix, ptrdiff_t stride, - int alpha, int beta, int8_t *tc0); + int alpha, int beta, const int8_t *tc0, + const int16_t *bS); void ff_h264_h_loop_filter_luma_mbaff_8_rvv(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S index 96a8a0a8a3..6bc5406ba3 100644 --- a/libavcodec/riscv/h264dsp_rvv.S +++ b/libavcodec/riscv/h264dsp_rvv.S @@ -126,9 +126,11 @@ func ff_h264_v_loop_filter_luma_8_rvv, zve32x endfunc func ff_h264_h_loop_filter_luma_8_rvv, zve32x - vsetivli zero, 4, e32, m1, ta, ma - vle8.v v4, (a4) + vsetivli zero, 4, e8, mf4, ta, ma + vle16.v v8, (a5) li t0, 0x01010101 + vluxei16.v v4, (a4), v8 + vsetivli zero, 4, e32, m1, ta, ma vzext.vf4 v6, v4 addi a0, a0, -3 vmul.vx v6, v6, t0