From patchwork Thu Apr 13 06:51:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 41133 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4645:b0:e3:3194:9d20 with SMTP id eb5csp1023758pzb; Wed, 12 Apr 2023 23:52:37 -0700 (PDT) X-Google-Smtp-Source: AKy350a/sBO92oLll/HR6HhVvLPIoVSLmJxTXF/KBFPbdKE73Pd8Ny37PIhAx98lzm/Vcw8ldF/9 X-Received: by 2002:a17:906:3c16:b0:933:be1:8f4f with SMTP id h22-20020a1709063c1600b009330be18f4fmr1492723ejg.9.1681368756957; Wed, 12 Apr 2023 23:52:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681368756; cv=none; d=google.com; s=arc-20160816; b=t4PGqtv0JW4S2sofEYxDczc/xUnmpu9bsmS5qnrsG9Dhie0nDpT6wJiTTuGNUZCxNB 62QYp8UigIy8rW8wfir5PHPvHLvuaqxyCoeN9ppnMbKXYp19Nr/h7pFksKSnjjiQ7DhL bnqk5igmqVWuu9QgCER7BpaU7/NKRAlgDDFPkL54PN+aAayIIsC4nFswa0nZJGVwPqD2 c7c01UXfQIjZOoB1X4wJe963NjJ38nJiWsV4aKl06h6Q3jmWYsQBCn6/GvyP3OY84y1d JXFRiaMNABti2gqGLzbPqHaaSOR5gakWCllM7Bljyj4wuBtdZWnzjcbGmsYOmJ8gKbGr pwhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=N8U4TTKFMvU1XxcuL9A4V2wwG7HEzK8nyHIXl4gT5PY=; b=OaloQXZ3mJEFccPH1s+7Totgr7CznxwC8ql6qPBX82dRn2Wk5n3Ot+Gei8t7GuzM9T rHv6jUI/8V2wEr3M3eV8Ds3iuG6BmBs2SNWVcIi71giNE8hHdhsJa4ewE9pNfsmvICow U795vF43vEKsmEqZcPl96wtL2XRdSlutblFA/5K/ioAi1Z2AlmkVLLNeUrsTBe9tUWX2 ClEUBWnEHOvFSdre7Gk5zl1tnaHw92m8Lnru6rA5+yDndC1DOyMZFAKr1eDzdIN0AqzQ YslIOU1DqK+cNXl0pdW2uFHSGxWa2iAvR/UzTNjLLClhQ0MfKljf3TzhNZBQs9KEsRg3 rCTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=UNSIFyw8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id wi22-20020a170906fd5600b0092beba8fe87si1117281ejb.318.2023.04.12.23.52.36; Wed, 12 Apr 2023 23:52:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=UNSIFyw8; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 774AB68BB30; Thu, 13 Apr 2023 09:52:33 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-231.mail.qq.com (out203-205-221-231.mail.qq.com [203.205.221.231]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8BCB968B6D3 for ; Thu, 13 Apr 2023 09:52:26 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1681368743; bh=QmgCMh3kHy/f6QzENmBTXbPac2AbMevzv9WTS9Vl3Mk=; h=From:To:Cc:Subject:Date; b=UNSIFyw8yRjhxgFX4gLll3t3hXVF80icyjZA3CDAIgfiIMSfxUoy1kbF2JLLOu4p/ k9y9Xqqn1k3wSgGT8Lkpi/fZAoRnQIy86n4Bb2mP2NTexCnZVqvhmouw7r4iZb0JHg ppC3taCi7UP9uRvOStBfp++0ZAxhL9Vpq4BZf2FA= Received: from localhost.localdomain ([59.41.119.190]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id CCC83E2D; Thu, 13 Apr 2023 14:51:12 +0800 X-QQ-mid: xmsmtpt1681368672t9ivah20j Message-ID: X-QQ-XMAILINFO: NY3HYYTs4gYSFTIO5qG93s97vkN7M2LcKvsHuQ8PWHkN2CX0dHlXpIRPl6A7+G hhnZ05DP/kNivRdYfMFe08YILkK/iiDmtdGYAmtUIwXdqRCoqxEsEedZ106+Z8PD9ijAe/45/EvT oc425yseIs1yTom/SQ90Bby1mxPQbHogFAL5yEOrmbljp1A9joc4VggEsei1oGUXW5DYX9KUuCZo zPTHv5iHJPkKp99kG9bDqO620V8ccWBFmMbnBRDcP+rVxbOLY39KC9IKxEiyO8rkcbJlP0BLsmQV 9nyZEXCIcujRuJMAyAzuKufoQaDH06nmnVLSVGD9joRecXnG3yF6QJ3pyrs55vTGKH6tyJUg2CRH rUD2P0opM9cFdU64MuE+S9Yq/C3yYfb5qXVibZ/cwpOOaTmfYBqPLizr9Vk3Bnj0Fl84CwV9HfTN l6n60F3NMegA4XHWJJfdjpjPdi8UVd5IbZG069bZ3t2xalPi/rhxA2O8no88gyMK6u4G3TC8w7+4 G9GvwvbE22R0TLXHYwcGM1tdfjOp/iKklXw/psbLD06Qadbe0i3FdYnfG/3QTeprNs2fzNadflZY a9Lch8jB1+tsH1YHR/vw4oHZDHkwU6eYujNkX/5MdLh4MQwNFBgug37bwC4UPAbGLO5PTtxguSV0 5v/+AnmVu0msRNPQSX8V9io0tU5OuupINMUnArY53v7nvwITiLu9gcYH9KrZm2tfltt1drVqldxv ZP0rTTXUD4nGY2LfSlSKw/sehM9NtGXAsH1gvqu6OwDoPCMzTNHea7IxR2yzjBLpDDl8eAJlr0Ff QzoOALWRM2Enzc+s1n7H5OyHlcTII+3O0vmtztc79FyjsptRWboMiijnzBX5KNWOhoKWk+ZbAu7X xW3Mxe/zNdfgmWI7W0u69hTGFUFSgj1me+yUyjyiqP3VHhvReylbkOuJRWOtbxr947zHH2ig0yDn 5TZl+Abksk+fWw83xdABiIk8hAQoSGE8Aj+tCpyB1mny4k0pq8aAczapYxwcgLRq8EUOXwC04= From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Thu, 13 Apr 2023 14:51:04 +0800 X-OQ-MSGID: <20230413065104.17131-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] codec/aarch64/hevc: add transform_luma_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: sL3PDMrvOJv8 got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 Signed-off-by: xufuji456 <839789740@qq.com> --- libavcodec/aarch64/hevcdsp_idct_neon.S | 50 ++++++++++++++++++++++- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 994f0a47b6..504258f7c7 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -889,4 +889,52 @@ idct_dc 16, 8 idct_dc 16, 10 idct_dc 32, 8 -idct_dc 32, 10 \ No newline at end of file +idct_dc 32, 10 + +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 4cc8732ad3..be1049a2ec 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -78,6 +78,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -146,6 +147,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] =