From patchwork Wed Nov 8 20:30:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44578 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4fa4:b0:181:818d:5e7f with SMTP id gh36csp54702pzb; Wed, 8 Nov 2023 12:31:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IH/uRj/jpP9kpQplBjFH6IqnXKLVhXozIC64RL1sw7/5U9U8u+nuvQw1/34G0GbvSBcn24w X-Received: by 2002:a50:cd08:0:b0:53f:bab5:1949 with SMTP id z8-20020a50cd08000000b0053fbab51949mr2334940edi.12.1699475464466; Wed, 08 Nov 2023 12:31:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699475464; cv=none; d=google.com; s=arc-20160816; b=POFa6pgO5ljRyuIsOAKKsxEADNBgFMW6Q5G54HeJR8Intyug1mD6DKNYsp0u3P5Don H5QKBcpEEcd0cx7GFx0/pTC58SjCcfWoHYvGUhKfFziF6b3rgzodkzsgsPNTylm/C9ne tKqU4+YhKGw7rPNPA6uVZMn7BpxHCeSCg/eQumPDxSmyjQMtTaxPNxpZqvHm8v2015YY E1+I46mGYrfQi4YopYWsvjNfETpXkMQAfkBSLbkXAQy31Gwk+Rru5sP4uIQFvJ6dQLHh sY/japWAYfRUZq6xLMhPu26sisuITvyMkFsILdcws4D8JPRemi0j2kqt24m1Ewzq4WNO l7Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=LkRTkJqjQbWbtBoeNZU2YfZHq8AocpeKGBoqRuQ/Qes=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=WiBa0EbFcJMrib+aRpJ6BkWWVxj2ht25mv83ZdbXU3ESX2Z/59pZHRLLAT6h8GdfJT cKxsFgCFJiRLE/7v+BpnXrj59z6rs+FaXaQ1ITpQCASwh/sNFjSRR/DDSTYgeHBRKDce uhasZrJQ8chTSUItBnysqdFGHSKttoj8jaiwFIexD0rveaSYXZzqjXmz5xTjqWco9h4m YhJg/gXtNvgrw2yMprs24VCeLULxyPkKhnOl+N2UaxvoWhx+RP+9LYiNlUeToKeyzelB GS0bIoZTplxj4E7ZoXLSvp2U3KjIjHszMd2D1AqwNNRZfx50cIKCmqajQixJJ8F+yZJN wk4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b9-20020a056402278900b00540e7accaffsi7081108ede.285.2023.11.08.12.30.52; Wed, 08 Nov 2023 12:31:04 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E531B68CB3A; Wed, 8 Nov 2023 22:30:38 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EB04568C92E for ; Wed, 8 Nov 2023 22:30:30 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 0E353C016C for ; Wed, 8 Nov 2023 22:30:29 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Nov 2023 22:30:26 +0200 Message-ID: <20231108203028.51482-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/aacpsdsp: rework R-V V add_squares X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rk6fu7tukg+4 Segmented loads may be slower than not. So this advantageously uses a unit-strided load and narrowing shifts instead. Before: ps_add_squares_c: 60757.7 ps_add_squares_rvv_f32: 22242.5 After: ps_add_squares_c: 60516.0 ps_add_squares_rvv_i64: 17067.7 --- libavcodec/riscv/aacpsdsp_init.c | 3 ++- libavcodec/riscv/aacpsdsp_rvv.S | 9 ++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index c5ec796232..f72d1bc330 100644 --- a/libavcodec/riscv/aacpsdsp_init.c +++ b/libavcodec/riscv/aacpsdsp_init.c @@ -46,7 +46,8 @@ av_cold void ff_psdsp_init_riscv(PSDSPContext *c) c->hybrid_analysis = ff_ps_hybrid_analysis_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { - c->add_squares = ff_ps_add_squares_rvv; + if (flags & AV_CPU_FLAG_RVV_I64) + c->add_squares = ff_ps_add_squares_rvv; c->mul_pair_single = ff_ps_mul_pair_single_rvv; c->stereo_interpolate[0] = ff_ps_stereo_interpolate_rvv; } diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index fe250cd83b..cf872599c8 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -1,5 +1,5 @@ /* - * Copyright © 2022 Rémi Denis-Courmont. + * Copyright © 2022-2023 Rémi Denis-Courmont. * * This file is part of FFmpeg. * @@ -20,13 +20,16 @@ #include "libavutil/riscv/asm.S" -func ff_ps_add_squares_rvv, zve32f +func ff_ps_add_squares_rvv, zve64f + li t1, 32 1: vsetvli t0, a2, e32, m4, ta, ma - vlseg2e32.v v24, (a1) + vle64.v v8, (a1) sub a2, a2, t0 + vnsrl.wx v24, v8, zero vle32.v v16, (a0) sh3add a1, t0, a1 + vnsrl.wx v28, v8, t1 vfmacc.vv v16, v24, v24 vfmacc.vv v16, v28, v28 vse32.v v16, (a0) From patchwork Wed Nov 8 20:30:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44577 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4fa4:b0:181:818d:5e7f with SMTP id gh36csp54666pzb; Wed, 8 Nov 2023 12:31:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IE5JFtTCxXXd3i+EzPXIoYtCd+RDSCxA29OJ8dPrsKG5gJyrWW84rewxaVrrDNNKIO06M8H X-Received: by 2002:a50:d0cf:0:b0:53d:7be0:4a93 with SMTP id g15-20020a50d0cf000000b0053d7be04a93mr2841237edf.11.1699475460396; Wed, 08 Nov 2023 12:31:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699475460; cv=none; d=google.com; s=arc-20160816; b=cBmrpXbsLmqGeJgg7NysJgsaTMEYt720nwHXQ+dD+lOn0ESmC/gm5T25VO7icvtGst LsDQDi9NT59wLuBTcvSsiaFc6uzbkCdXVRlrWF9BSSRnnBwj/c911WRbWCd7Zhaf2ewB w9aD+pznFnOicaIP6FkFFkBQSdzJF4akSIAsXqcTBPMME43Aw314vKOfslPLWeaTUNw3 qcHyTgNCN+kPpNWgxfc44TksNeIXK0ZbddZEcGx34KUAtulZHoHKySf2l/xmeSAPtjGd 97W2ayJ2WOyq2PzTUljGLR3OcpRrFeQgI02NPb6t65XFndanYtie3PXQfvbfxV0c7az3 A1jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=OapVqFzZvGlxWNJrduNhUkZls7CVhsedSRwa6iePme0=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=hCKE/G503y4Da7iM6BtvwXUMuDvrU2Vxr7GTJiMZKSfbmqsbNc/mXvWQKWw88SrlFS rYghpkr9Ujpl/U9IsvuunLyzpeDTW/9vYBATyVWBU2NGxZpzGWKFnVtjnLWyYFdUI3aq fKj9Be6wInK2TQ6a+yparKHQ2KiLzPXmKBCd7kIAJIU//QoPdiRi90ZPWz4LY+UNYv2A ucFAEyJihb+SJNGP5D4OZG6WDm5dyVxtfg/bPIQSaIM5BVP+PDtIRVVCmXcPASPXizW0 4seKNs4e9lEHbX49gPZrZeIHLE6xBhzQWLb/U9k5R/CSoomtepuIerlWddlr2qie5TtL cOXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id b8-20020a056402084800b0053de95eda10si6882507edz.667.2023.11.08.12.30.45; Wed, 08 Nov 2023 12:31:00 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 997C968C92E; Wed, 8 Nov 2023 22:30:37 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id E556968C39B for ; Wed, 8 Nov 2023 22:30:30 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 412E9C016E for ; Wed, 8 Nov 2023 22:30:29 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 8 Nov 2023 22:30:27 +0200 Message-ID: <20231108203028.51482-2-remi@remlab.net> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231108203028.51482-1-remi@remlab.net> References: <20231108203028.51482-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/aacpsdsp: rework R-V V hybrid_synthesis_deint X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kdxzSBpVoZw2 Given the size of the data set, strided memory accesses cannot be avoided. We can still do better than the current code. ps_hybrid_synthesis_deint_c: 12065.5 ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before) ps_hybrid_synthesis_deint_rvv_i64: 8181.0 (after) --- libavcodec/riscv/aacpsdsp_init.c | 8 ++--- libavcodec/riscv/aacpsdsp_rvv.S | 61 +++++++++++++++++--------------- 2 files changed, 36 insertions(+), 33 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index f72d1bc330..e094660cf3 100644 --- a/libavcodec/riscv/aacpsdsp_init.c +++ b/libavcodec/riscv/aacpsdsp_init.c @@ -46,16 +46,16 @@ av_cold void ff_psdsp_init_riscv(PSDSPContext *c) c->hybrid_analysis = ff_ps_hybrid_analysis_rvv; if (flags & AV_CPU_FLAG_RVB_ADDR) { - if (flags & AV_CPU_FLAG_RVV_I64) + if (flags & AV_CPU_FLAG_RVV_I64) { c->add_squares = ff_ps_add_squares_rvv; + c->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_rvv; + } c->mul_pair_single = ff_ps_mul_pair_single_rvv; c->stereo_interpolate[0] = ff_ps_stereo_interpolate_rvv; } } - if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) { + if ((flags & AV_CPU_FLAG_RVV_I32) && (flags & AV_CPU_FLAG_RVB_ADDR)) c->hybrid_analysis_ileave = ff_ps_hybrid_analysis_ileave_rvv; - c->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_rvv; - } #endif } diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index cf872599c8..1dc426e01c 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -190,38 +190,41 @@ func ff_ps_hybrid_analysis_ileave_rvv, zve32x /* no needs for zve32f here */ ret endfunc -func ff_ps_hybrid_synthesis_deint_rvv, zve32x - slli t1, a2, 5 + 1 + 2 - sh2add a0, a2, a0 - add a1, a1, t1 - addi a2, a2, -64 - li t1, 38 * 64 * 4 - li t6, 64 * 4 - add a4, a0, t1 - beqz a2, 3f +func ff_ps_hybrid_synthesis_deint_rvv, zve64x + slli t0, a2, 5 + 1 + 2 + sh2add a0, a2, a0 + add a1, a1, t0 + addi t2, a2, -64 + li t0, 38 * 64 + li t1, 32 * 2 * 4 + li t4, 8 - 16384 // offset from in[64][n][0] to in[0][n + 1][0] + slli t5, a2, 5 + 1 + 2 // and from in[0][n+1][0] to in[0][n+1][s] + neg t2, t2 + li t3, 32 + add a4, t4, t5 + sh2add t0, t0, a0 1: - mv t0, a0 - mv t1, a1 - mv t3, a3 - mv t4, a4 - addi a2, a2, 1 + mv t4, t2 + addi a3, a3, -1 2: - vsetvli t5, t3, e32, m4, ta, ma - vlseg2e32.v v16, (t1) - sub t3, t3, t5 - vsse32.v v16, (t0), t6 - mul t2, t5, t6 - vsse32.v v20, (t4), t6 - sh3add t1, t5, t1 - add t0, t0, t2 - add t4, t4, t2 - bnez t3, 2b + vsetvli t5, t4, e32, m4, ta, ma + vlse64.v v16, (a1), t1 /* sizeof (float[32][2]) */ + sub t4, t4, t5 + vnsrl.wx v24, v16, zero + slli t6, t5, 5 + 1 + 2 + vnsrl.wx v28, v16, t3 /* 32 */ + add a1, a1, t6 + vse32.v v24, (a0) + sh2add a0, t5, a0 + vse32.v v28, (t0) + sh2add t0, t5, t0 + bnez t4, 2b + + add a1, a1, a4 + sh2add a0, a2, a0 + sh2add t0, a2, t0 + bnez a3, 1b - add a0, a0, 4 - add a1, a1, 32 * 2 * 4 - add a4, a4, 4 - bnez a2, 1b -3: ret endfunc