From patchwork Wed Nov 8 20:30:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44578 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4fa4:b0:181:818d:5e7f with SMTP id gh36csp54702pzb; Wed, 8 Nov 2023 12:31:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IH/uRj/jpP9kpQplBjFH6IqnXKLVhXozIC64RL1sw7/5U9U8u+nuvQw1/34G0GbvSBcn24w X-Received: by 2002:a50:cd08:0:b0:53f:bab5:1949 with SMTP id z8-20020a50cd08000000b0053fbab51949mr2334940edi.12.1699475464466; Wed, 08 Nov 2023 12:31:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699475464; cv=none; d=google.com; s=arc-20160816; b=POFa6pgO5ljRyuIsOAKKsxEADNBgFMW6Q5G54HeJR8Intyug1mD6DKNYsp0u3P5Don H5QKBcpEEcd0cx7GFx0/pTC58SjCcfWoHYvGUhKfFziF6b3rgzodkzsgsPNTylm/C9ne tKqU4+YhKGw7rPNPA6uVZMn7BpxHCeSCg/eQumPDxSmyjQMtTaxPNxpZqvHm8v2015YY E1+I46mGYrfQi4YopYWsvjNfETpXkMQAfkBSLbkXAQy31Gwk+Rru5sP4uIQFvJ6dQLHh sY/japWAYfRUZq6xLMhPu26sisuITvyMkFsILdcws4D8JPRemi0j2kqt24m1Ewzq4WNO l7Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=LkRTkJqjQbWbtBoeNZU2YfZHq8AocpeKGBoqRuQ/Qes=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=WiBa0EbFcJMrib+aRpJ6BkWWVxj2ht25mv83ZdbXU3ESX2Z/59pZHRLLAT6h8GdfJT cKxsFgCFJiRLE/7v+BpnXrj59z6rs+FaXaQ1ITpQCASwh/sNFjSRR/DDSTYgeHBRKDce uhasZrJQ8chTSUItBnysqdFGHSKttoj8jaiwFIexD0rveaSYXZzqjXmz5xTjqWco9h4m YhJg/gXtNvgrw2yMprs24VCeLULxyPkKhnOl+N2UaxvoWhx+RP+9LYiNlUeToKeyzelB GS0bIoZTplxj4E7ZoXLSvp2U3KjIjHszMd2D1AqwNNRZfx50cIKCmqajQixJJ8F+yZJN wk4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b9-20020a056402278900b00540e7accaffsi7081108ede.285.2023.11.08.12.30.52; Wed, 08 Nov 2023 12:31:04 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E531B68CB3A; Wed, 8 Nov 2023 22:30:38 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EB04568C92E for ; Wed, 8 Nov 2023 22:30:30 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 0E353C016C for ; Wed, 8 Nov 2023 22:30:29 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Nov 2023 22:30:26 +0200 Message-ID: <20231108203028.51482-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/aacpsdsp: rework R-V V add_squares X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rk6fu7tukg+4 Segmented loads may be slower than not. So this advantageously uses a unit-strided load and narrowing shifts instead. Before: ps_add_squares_c: 60757.7 ps_add_squares_rvv_f32: 22242.5 After: ps_add_squares_c: 60516.0 ps_add_squares_rvv_i64: 17067.7 --- libavcodec/riscv/aacpsdsp_init.c | 3 ++- libavcodec/riscv/aacpsdsp_rvv.S | 9 ++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index c5ec796232..f72d1bc330 100644 --- a/libavcodec/riscv/aacpsdsp_init.c +++ b/libavcodec/riscv/aacpsdsp_init.c @@ -46,7 +46,8 @@ av_cold void ff_psdsp_init_riscv(PSDSPContext *c) c->hybrid_analysis = ff_ps_hybrid_analysis_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { - c->add_squares = ff_ps_add_squares_rvv; + if (flags & AV_CPU_FLAG_RVV_I64) + c->add_squares = ff_ps_add_squares_rvv; c->mul_pair_single = ff_ps_mul_pair_single_rvv; c->stereo_interpolate[0] = ff_ps_stereo_interpolate_rvv; } diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index fe250cd83b..cf872599c8 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -1,5 +1,5 @@ /* - * Copyright © 2022 Rémi Denis-Courmont. + * Copyright © 2022-2023 Rémi Denis-Courmont. * * This file is part of FFmpeg. * @@ -20,13 +20,16 @@ #include "libavutil/riscv/asm.S" -func ff_ps_add_squares_rvv, zve32f +func ff_ps_add_squares_rvv, zve64f + li t1, 32 1: vsetvli t0, a2, e32, m4, ta, ma - vlseg2e32.v v24, (a1) + vle64.v v8, (a1) sub a2, a2, t0 + vnsrl.wx v24, v8, zero vle32.v v16, (a0) sh3add a1, t0, a1 + vnsrl.wx v28, v8, t1 vfmacc.vv v16, v24, v24 vfmacc.vv v16, v28, v28 vse32.v v16, (a0)