From patchwork Thu Apr 13 13:34:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 41144 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4645:b0:e3:3194:9d20 with SMTP id eb5csp1252858pzb; Thu, 13 Apr 2023 06:35:23 -0700 (PDT) X-Google-Smtp-Source: AKy350bjlmyYBkhjrSfur204ecmlznounJCaH24vzqPbKFB2Q1uiEAVnbiOEbpn7SfENzpw6K5Sr X-Received: by 2002:a17:906:8e8e:b0:93e:5baa:d443 with SMTP id ru14-20020a1709068e8e00b0093e5baad443mr2684596ejc.63.1681392923379; Thu, 13 Apr 2023 06:35:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681392923; cv=none; d=google.com; s=arc-20160816; b=OV880ejTL7FiPVBkEwZot2+x0BAjmkyxfwDlpefzFZfU0Zg9MCzq72bxM1GsLHr36e upQlrQctVaStDRuSqKjNNa1JZhoKQaSIbMY06tHR3xoYLiqKGTYUdI//s2kBwEDTJv0/ OgowUb/EeK7p1K+LuD9pwbvxzORu+8Ks06lMSbwBzYGF2Lliu0+siz+fFL1Cd/gXmTJ9 IwL76mtdpF6SCBw3MLEoRAErmMvmi7vLKB8fBrdUHaUH3MJO9DdHMFmlsztCaywQIYJo HokKU7VKwhPioF4OFOzjkKBOd4ZpA1MnOYC8h7yvDnTQvrg/qoquIE8AOJalQs358fzM tQJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=9XDxYS/dhLpXuMvIzNf/wyGUuvZt8y9uqBWEb0It9kE=; b=LtKejm6dW2u/snGi0zvwKzb5TEh42iLsqOGjFXOnQKezWfKwQ3pRTFIRrhEEcgONYT F+MpmJlGlCjwNjYHwBqEaF2+kcUGwuWLZ+LcnYD0AEn1saRR1nQCWeZGz02sWh8XxIDc Oxwxan/XzkPI53HD45KBha6QYjnh85qZvpVR3YOdx9uWiC8mzxvL90MeIlLLqiO7Y12t 0U+eftfgV9TJ589lT19gbma71DZ1YytLYWE9f/tEyeUaIO/uMa5t8KRcEtzcKViCa5iN Eaee2QGlrJC4KLl667uSyqKw541h8fc5W52xfgxVbctPAIsSrbbz7Q772WzlCAxHYSTe rBIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=ylpogxU7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id eu9-20020a170907298900b0094e893478e9si1596652ejc.113.2023.04.13.06.35.12; Thu, 13 Apr 2023 06:35:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=ylpogxU7; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A313668BD37; Thu, 13 Apr 2023 16:35:09 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-72.mail.qq.com (out203-205-251-72.mail.qq.com [203.205.251.72]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E762D68A621 for ; Thu, 13 Apr 2023 16:35:01 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1681392891; bh=jHlKGgZijvwSaVlSGzadC8IgZfx/L6XysRITEHD7cFQ=; h=From:To:Cc:Subject:Date; b=ylpogxU7MooCJsfVt3jbAzCOCiuNx28c65LxUXSyXndy5Y/hqPXTwdLcBk+bOkcL6 h026BvnzCD/y0Dnlt20jTZtJeQ1Cgldyj5RAREEuVy6y/QE2mg7a5R6FKiF126hANJ dbNJIwpVS5xghXFgezryHwC6PZdAb2BW4JKHzi78= Received: from localhost.localdomain ([59.41.119.190]) by newxmesmtplogicsvrszc1-0.qq.com (NewEsmtp) with SMTP id 8B1A7C00; Thu, 13 Apr 2023 21:34:49 +0800 X-QQ-mid: xmsmtpt1681392889tryaw6aox Message-ID: X-QQ-XMAILINFO: OLnGMPzD2sDVZhR2SfHHfQ1x2e1rzGBKCxb1+AREDykSQEdNUaRiKHmW4L6ouN Mj0Xr5in8NCYseuNe6HicyJ7qchY2hD7xo0Yr8ONyYL5hP2ShdtjZ8A+w5usuYc2xJmXehMDeAUH SP1y4CcH/5Rr0lfOy2Tqo3fEVmm4E//elVWMEHjQpDyOEi8k9bkoe9BLOf9hzsbbZX0h11b+PSkU /bdvPU6yoqNuXSG6oumdagt1umysUGWA0mVnxEqLeGcHdq8eaoGUf5FtieaHr5saBWZb8Lf/b0mf p3wtkAn3p1HO6Vb2iBJuBR8CNnmWwS/PSi9va0LX0b9+2rofk6VOoReIDkAWnxSrB9R0wfoysdC6 hVRKp7MHDfP3oAyh4+hMsVkKqt3QMytGBvGHbi2ppgJVJQI2PUCD7hEPzV1xdvkQRARlCAu7wPnY 9TB2BMEAeJb65rRQSqlKVjGmqHNVLDfcr1QjpHYHZ06zuU/3xDorPVLfhm5rnkFoZu7GIrq/c7sA Tp+2+0YclSHE6oJNjSuIVE1zdjeabge1Jk8rqXa3yLHqwqAtleYcEqoibi7uGZzh2tMrKIlBFecw /Az8Dylmu0e17OIVmhbMcl4wk71r943uHIgBZ7Q+DX4+EBJ0gff1+aa+UNCYdSJScbATPaEUGBDT hKsqAyPG5TAIwrjtn5PB9UgIr5Mibo1BvcbbBHm05mkmLtbpyVDIN0+4DcuwzWUFA0JYdMLwU91k N5num7yri492VkrhEXd5XX3EosJYTFBr9vnn1d75L0D53dMrGaZGHaflMk49qWtBdDnefbgH7hxU zgZbqTh2SqIevGQvyBXTzbxPjAk6Q1AwV7QzG5EyygoHVeEmy30p5HqjeU4DegAh+JdSK7YUjNKE +T7SgPh31si4n/2WvIS172gb25Qu3ivXESNs7Wrortri6ZKOY1XhOi6a/xcLEOtnOArqUJoDX3pq MJ3LkLsWz05FLicoQqvvHG5QRoCTLIksq6jOYRiG/+r7FQGPwrVI+Dmge/FLCO47C0K1Sf5+M= From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Thu, 13 Apr 2023 21:34:47 +0800 X-OQ-MSGID: <20230413133447.35924-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] codec/aarch64/hevc: add transform_luma_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Dd2C2GsuAd9B got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 Signed-off-by: xufuji456 <839789740@qq.com> --- libavcodec/aarch64/hevcdsp_idct_neon.S | 48 +++++++++++++++++++++++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + 2 files changed, 50 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 994f0a47b6..4a25787070 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -842,6 +842,54 @@ tr_32x4 secondpass_10, 20 - 10 idct_32x32 8 idct_32x32 10 +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc + // void ff_hevc_idct_NxN_dc_DEPTH_neon(int16_t *coeffs) .macro idct_dc size, bitdepth function ff_hevc_idct_\size\()x\size\()_dc_\bitdepth\()_neon, export=1 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 4cc8732ad3..be1049a2ec 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -78,6 +78,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -146,6 +147,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] =