From patchwork Thu Mar 30 13:21:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 40921 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4645:b0:e3:3194:9d20 with SMTP id eb5csp1334905pzb; Thu, 30 Mar 2023 06:22:22 -0700 (PDT) X-Google-Smtp-Source: AKy350YB/jidPE3EWj4ZbCl86kEBoWJqeEepc//AfWnw7XHwOXGWx5RsesOMtraVCu6StfvCIKDB X-Received: by 2002:a05:6402:491:b0:4fd:2140:5cc6 with SMTP id k17-20020a056402049100b004fd21405cc6mr19539360edv.17.1680182542716; Thu, 30 Mar 2023 06:22:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680182542; cv=none; d=google.com; s=arc-20160816; b=B2w/9Ezp68qpxqh7wQqUdGuiGHvhjGmxycaAtg1yEWM1nFXfeQpFgNWTedLl+meqIn pBqLNNOLbzf4WgXlUTUUxAD0hR2G2AsLI86BemoPhXQtG/foqutheppRNXnfQgFcSDBg bSnJo3pAMsdCGCN3kkYnrzXU/CW1rqZUNnGm7zEbJveBfvCWVv/3xFzmWYFQ9Nde2dxn UXgtu1wiO1rOOR7nn3RME/bOP31Hl8QIl7UJ0RhFTxu95RSu6ZWPkTbq/DG+0j2WHEDf fIMzMDuz29JHKB7gKn8yAwXG1yi+WccHq+zxEAb211j31L21g19MLst4fUoj2cAMm3Z7 tNcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=VCyUOh+qt1Z7Q4WU03n490VpS/v2hZRvaExLhMVeBNE=; b=HRO3NOR8B138mVut2LOHC1Dkvfo1A99KJxTajp1nZawckNvTPmG9eyoF4F03pZvK4+ 7G6tQkBDalrnyCYqN8TNmP/G9bTEdaJnoxvplvoMFkW75xgL7t7hWYXlf/+3ze8hp01I t4wlONQGXwoR1PhppLnnD77FFz7C1O/nmmK+E3qzdGZvkP3bSsjYoX9d1LxTl/fcRhVe jwON3QZsxXUAnQI6+GjaNyPpRPFW002Yg88KH2nepJXObwNSc9maXjpyJIMM3oRIHDOl u3orhPvVouPOgynysfW1iLJp6BSD+PTifJYkF8ECb5lDd1RVOQo5jukNwFAAfWiO8rz6 PA0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=Elx+2h1G; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i18-20020aa7c712000000b004accb68cc93si36289670edq.387.2023.03.30.06.22.22; Thu, 30 Mar 2023 06:22:22 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=Elx+2h1G; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8E23768C422; Thu, 30 Mar 2023 16:22:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-72.mail.qq.com (out203-205-251-72.mail.qq.com [203.205.251.72]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9064068C3E3 for ; Thu, 30 Mar 2023 16:22:08 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1680182518; bh=vL1hUPSENwICnScGJULw/i6lR7T4guYx28WJ6a7W3Qw=; h=From:To:Cc:Subject:Date; b=Elx+2h1G8l+KPd0lYm/a4NBZMWkuLlDdsNfQPM2XQlQVUWzzRfEA6A0j9XuZWNAtA fUVk2P3qVUuonIdxCt+1VLaWzHoVBpqe3l3yTtdzEvkMkPLq+R2duF7vYD/u8ue53Y Q12SiRxGD0BpxQVKEX1ozRdJiPh1HpZhbzQYsYNo= Received: from localhost.localdomain ([113.65.129.56]) by newxmesmtplogicsvrszc5-0.qq.com (NewEsmtp) with SMTP id 579B403C; Thu, 30 Mar 2023 21:21:57 +0800 X-QQ-mid: xmsmtpt1680182517tkzl6b1i2 Message-ID: X-QQ-XMAILINFO: OdDhBiKk1LgMb5KKYwavcH5jngD0wM9VrGbLTY5HZHX8XUM8sl5ycTAqWb/yG7 1s8A87QOXrlZRVza7qMiuD9+g1TLwClSGkM4Tt3E66qvqfJs5/Q4q1mOhMyBy8CwWFCPvvjGU5Kl TMU4sBfu9LJt3lM2ISmjDOiwo6GrMZmv6fvAOywcrGYKmd1rR2lC8fTp9GxfH240pe9kVxbLEKSn BAgCojc3BjWKb2niomdIXAhN5GddLNkCAqydudxbZZoeJRBSOeFk+DOcSk0l8EqdsusKujabC6cx tktOy/7PSy0gZ909d+iRGDM6Mt1pDNDE6lxUB9biF8ZgMAqBBs9j53z229syhQ7YuW/ExEPgeM8z SUVkSICC+pA4j8e2MPY5Q9Rvy0JDubRaykEf5CDYtFqZmNt5PwK5T5jZSGAGKm7GFu8isq5lPp0U Y0JhnYpbCHH88OVfD16N7PAA4dQn/vfYAzE97Nyu8479RoQZUvg9Z1nwPafVVHqeUIn3TXzdNUMj TWWRhwFUtoBaAA7g9Z8S66Cr8NO+A5aLdqPl819Ydul14+L4VCDLfNlseuL/1X5S17x2qtVAjBkP G6HFIjj2iP03d6rVzygSIXxfBLGhyohL5Q0kEpp1wsT1p0voXqSwZr890cND05JU/8w8fBm3I/I2 J5ViIq0GzREg3re0qCQyzTkP9LzprvLYz+kbih7NgkVQSdfokCqcInKp4HfTYzvPqz8WSu0SQ4b7 oWOd2QvjiXIGqrS4WVvA7kHPwZympJnYL9lvHe8ED4Ldk6uKgrE8/XCFTLvZrcRNrTbfdhS6x7Tj uIUBuus0tF/Sj53KLI3+nkKegbdaZ0nhZcKV0F58aXYkFREgS7DlnrJSuNjCd+1uwgVoU5iVHGww OtzL3xFdohlGXB5Y22rsoRBLg5641v1DQqId+6/ogQf4pA9MtfVQJmvvcImYNIJTybGsYHnrdmfg jxGu0ky9wCtkc65oNLz78oQx8ZuXCAA6Wcndi/N/642h8sin1HnA== From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 Mar 2023 21:21:55 +0800 X-OQ-MSGID: <20230330132155.20835-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avcodec/aarch64/hevc:add transform_luma_4x4_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ppE8fKObDKbh got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 --- libavcodec/aarch64/hevcdsp_idct_neon.S | 51 ++++++++++++++++++++++- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + 2 files changed, 52 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 74a96957bf..f302ed9773 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -6,6 +6,7 @@ * Ported from arm/hevcdsp_idct_neon.S by * Copyright (c) 2020 Reimar Döffinger * Copyright (c) 2023 J. Dekker + * Copyright (c) 2023 xu fulong <839789740@qq.com> * * This file is part of FFmpeg. * @@ -656,4 +657,52 @@ idct_dc 16, 8 idct_dc 16, 10 idct_dc 32, 8 -idct_dc 32, 10 \ No newline at end of file +idct_dc 32, 10 + +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc \ No newline at end of file diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 1deefca0a2..10e7f2318e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -63,6 +63,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -128,6 +129,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] =