From patchwork Tue Feb 14 10:02:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 40392 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:5494:b0:bf:7b3a:fd32 with SMTP id i20csp4193042pzk; Tue, 14 Feb 2023 02:02:54 -0800 (PST) X-Google-Smtp-Source: AK7set+oep+OfuFnGUrWBmIpOjEnQuFPPrfUDHVjiWlfXOY1ONCjh+zihSOE4E6q6vqkycLpYuy1 X-Received: by 2002:a17:906:4e91:b0:8ae:d3c9:1db7 with SMTP id v17-20020a1709064e9100b008aed3c91db7mr1995390eju.73.1676368973802; Tue, 14 Feb 2023 02:02:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676368973; cv=none; d=google.com; s=arc-20160816; b=GVrOCjsM6LFLU0YflR3reGX0bPWAalpLuq6mRUsHIBPZGzaXTIWSL4BcNK+bWbV06n PWpRcl6Y6zByIXrhHhL4KDTnUQUgXutZ4xWxE0xrfh/GfUxurGRafRsi9hnro80HyiXg zQCbBCnT6XJtOMr0ckGEqtz0ZcZjC5Stct6hS47KZXDnovnH2qkoKGxOTgcIj0hfldH/ wMFd8YrsCfFyRi5Zr5i4ffBeftaAEO+OIVQqfie4ADQf4X7rifdooax/ZK2eF2aofZOd hcwXCaBJwEC376oV87j/jbH06WzHmjNSpulGmAnb8hQ2+xrfKmxhM2DzD0ENzqdCPwdc ek1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=PLoUiqN57lFwN++NkQWqVASRH632Bex+8JZq/kgxikQ=; b=XL1rAHjg3AacZU1Z4WM/wcd5ORlJfz39N4R5QwPBsr8BRkdqMS+wBhi4824Rh+E90w em6RNFqYGwf2H6r+ldNuWMqCC1tDui1g+/JAIcl0UOKrbj0AbrqI/1+xugCr2mPCq0tM VjEDbsBG9OB/SHqYQSffyyYcdOGqF9gFPp8JHOrxLF4VdRuiQSdC8HEIXbMWxea0NkDY yUBp8bvJ+krDBoj3NR1ieLlptD7LvjxDKWnQMYh2tGt8OJTi8fxYXJvShHJ/Rz+l+PXS PpmO08i2DKNQeoEcNFj5sqFmKONau8WGqmJ9wiZ+WfRYb14E4SHspiY0ORMEZRu0dazU fwzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=TE7NiFPc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id vf2-20020a170907238200b0087cc469b195si15151981ejb.423.2023.02.14.02.02.53; Tue, 14 Feb 2023 02:02:53 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=TE7NiFPc; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4A74E68BE96; Tue, 14 Feb 2023 12:02:49 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-84.mail.qq.com (unknown [203.205.251.84]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 56FCA68B8C1 for ; Tue, 14 Feb 2023 12:02:42 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1676368958; bh=5SJqS+GK8PsH77RTHSjiQPH/GdbiQAL97zRGyWzub5o=; h=From:To:Cc:Subject:Date; b=TE7NiFPcdG1VVK3QQ4w1YmZ5A/kDsLee8q031W1DypRIJg7uhD2l8kFQA001LOnpr XXrbsw/64sJcFCJ/EfBXZcKYXR5pSouyTI5fvYFHL5izEZTYYHaqxQfgGkigg4HV6/ 7wsLUof0J/wjwTNjhsrXTklgJVTXpw0hn3Nam7c8= Received: from localhost.localdomain ([113.65.129.138]) by newxmesmtplogicsvrszc2-0.qq.com (NewEsmtp) with SMTP id A490C5C; Tue, 14 Feb 2023 18:02:36 +0800 X-QQ-mid: xmsmtpt1676368956tipceozcl Message-ID: X-QQ-XMAILINFO: M/mPnrApI6KjdNZwwh813VBBnIg86nmvueB8LHfuodviEUg34v1vQjGsCpoE6y CNs+jEzgSMgGo4BIftTKmuouSeNUgeq5GWjJ8Co9nQ83J1jCFuCh90OJ2GICZ/mYh0eJhR8yOInn hDDFztlu0xeioeQEoysiNr49EvFv6j7KQPs/2ivMXDnejYMxNougP7VTMkqxZWugduEebGXCH/LB 4SNOE/N520wa8OcyQq+/ebcDU56QzVhmS35AZzYm9Ut2CmIl+wHCyfQCcF1e73XnM5KJBB47uhT1 VpXo/vcIrt4Ss72G88YFey+zE4vrIocArrNpTEqAZhgZm5ubWVAnXh2+ExWyoWTAkCyD9EoU9gVp v5zj72jr6Wzc05+qFgNK0HLq/duNV+WVqaUxEZK/7fww0EgEVE1Oj7pfWPc88y/2IJG3K1X8O103 Z62itsyw6WVL1xlQcmeF130paOfD12aiw12N27nkrH+dPSxG35C2LNUIeoRW2J/d8qsKKE+V7Ubg Q+iX7MgvLIL2umdJaDuRLMD7A8fQY50pQwSiwKo2qwEjpwIaiChrCtr6I5s++O07hRioTqlyLS35 Gma+WThi6r4h8Ct01zaAq2K/W8TQt4H1woB3YmYmYc44H8HNXSD5sROAadgxvMPSdVotYbJbht3q N98omAHSxBGptQwkppOeVVW5IALEpr3NhcujFNdb2aXF9gQqDEqwprdTofZ+6/VtnxdirTQGOaG3 uf7PssoiaD+iuuo+INBIusImrmc8Vy+ku8VCaYPDRT8/wit5ftZIgdLgmnFsCk3iHSYdpFwJPARD NT5lQatY95CNtthb0obGmRiISlsDif4VPr8URi8woPkPkHxDkFuyzicjo1XkW+EX2FdB8PPsP7v3 hIbSJPPEJWF4BA9h0AEHc9XUXE98U5Ey8Phe89yMu0bcq383DbPTSK0s3HzGaHlAEbS3w7RIxo2i +ZiOpDM/iPgGhe2/eGgw== From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 Feb 2023 18:02:33 +0800 X-OQ-MSGID: <20230214100233.41188-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] libavcodec/hevc: add hevc idct_4x4_neon of aarch64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: c2GUW1Ca0moe --- libavcodec/aarch64/hevcdsp_idct_neon.S | 51 +++++++++++++++++++++++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 4 ++ 2 files changed, 55 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 124c50998a..fe8baf1348 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -245,6 +245,54 @@ function hevc_add_residual_32x32_16_neon, export=0 ret endfunc +.macro tr_4x4 in0, in1, in2, in3, out0, out1, out2, out3, shift, tmp0, tmp1, tmp2, tmp3, tmp4 + sshll \tmp0, \in0, #6 + smull \tmp2, \in1, v4.h[1] + mov \tmp1, \tmp0 + smull \tmp3, \in1, v4.h[3] + smlal \tmp0, \in2, v4.h[0] //e0 + smlsl \tmp1, \in2, v4.h[0] //e1 + smlal \tmp2, \in3, v4.h[3] //o0 + smlsl \tmp3, \in3, v4.h[1] //o1 + + add \tmp4, \tmp0, \tmp2 + sub \tmp0, \tmp0, \tmp2 + add \tmp2, \tmp1, \tmp3 + sub \tmp1, \tmp1, \tmp3 + sqrshrn \out0, \tmp4, #\shift + sqrshrn \out3, \tmp0, #\shift + sqrshrn \out1, \tmp2, #\shift + sqrshrn \out2, \tmp1, #\shift +.endm + +.macro transpose_4x4 r0, r1, r2, r3 + trn1 v22.8h, \r0\().8h, \r1\().8h + trn2 v23.8h, \r0\().8h, \r1\().8h + trn1 v24.8h, \r2\().8h, \r3\().8h + trn2 v25.8h, \r2\().8h, \r3\().8h + trn1 \r0\().4s, v22.4s, v24.4s + trn2 \r2\().4s, v22.4s, v24.4s + trn1 \r1\().4s, v23.4s, v25.4s + trn2 \r3\().4s, v23.4s, v25.4s +.endm + +.macro idct_4x4 bitdepth +function ff_hevc_idct_4x4_\bitdepth\()_neon, export=1 + ld1 {v0.4h-v3.4h}, [x0] + + movrel x1, trans + ld1 {v4.4h}, [x1] + + tr_4x4 v0.4h, v1.4h, v2.4h, v3.4h, v16.4h, v17.4h, v18.4h, v19.4h, 7, v10.4s, v11.4s, v12.4s, v13.4s, v15.4s + transpose_4x4 v16, v17, v18, v19 + + tr_4x4 v16.4h, v17.4h, v18.4h, v19.4h, v0.4h, v1.4h, v2.4h, v3.4h, 20 - \bitdepth, v10.4s, v11.4s, v12.4s, v13.4s, v15.4s + transpose_4x4 v0, v1, v2, v3 + st1 {v0.4h-v3.4h}, [x0] + ret +endfunc +.endm + .macro sum_sub out, in, c, op, p .ifc \op, + smlal\p \out, \in, \c @@ -578,6 +626,9 @@ function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 endfunc .endm +idct_4x4 8 +idct_4x4 10 + idct_8x8 8 idct_8x8 10 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 88a797f393..1deefca0a2 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -49,6 +49,8 @@ void ff_hevc_add_residual_32x32_10_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); void ff_hevc_add_residual_32x32_12_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_idct_4x4_8_neon(int16_t *coeffs, int col_limit); +void ff_hevc_idct_4x4_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_8_neon(int16_t *coeffs, int col_limit); @@ -119,6 +121,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_8_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_8_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_8_neon; + c->idct[0] = ff_hevc_idct_4x4_8_neon; c->idct[1] = ff_hevc_idct_8x8_8_neon; c->idct[2] = ff_hevc_idct_16x16_8_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_neon; @@ -168,6 +171,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_10_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_10_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_10_neon; + c->idct[0] = ff_hevc_idct_4x4_10_neon; c->idct[1] = ff_hevc_idct_8x8_10_neon; c->idct[2] = ff_hevc_idct_16x16_10_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_neon;