From patchwork Fri Feb 24 07:43:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 40487 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:5494:b0:bf:7b3a:fd32 with SMTP id i20csp1036832pzk; Thu, 23 Feb 2023 23:43:55 -0800 (PST) X-Google-Smtp-Source: AK7set9zh1Yd6Ufuje5XoxRE4pLbAgGHPB5bRfMT+Bgz+ChHwIOR4S02Vl6Xh/oIbJNYbrK7FO/P X-Received: by 2002:a05:6402:647:b0:491:6897:c5cb with SMTP id u7-20020a056402064700b004916897c5cbmr14245567edx.41.1677224635283; Thu, 23 Feb 2023 23:43:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677224635; cv=none; d=google.com; s=arc-20160816; b=s1/69aE1UKEzbGfSFc/k8a6YHwmjxhyMxWxhHP+RbTBDCJvkj6n2aag5ckURLvtgNJ WRDhyDTAiM6gA9LytRkv/BveyJy3WpC2biJ2es45Aqpn+PLsNBunxhb0BGBuejEEC/Hk HTsdWeD/z7nc1N7Ln5W8Ax0hlVrr7/TCNixMGdFC6jWThwtYKnZ48jkB+hMONhBFuE0Z 5De0W5HDNsIqYFyinNIjidc06ccom2EvpxKU7EGjQ2hvDBQDtK+EqqFYlkK2X8uWzTaO /H1Ap8A9JaCossf28ezeHuGW5ahWmCTp24B/0M/L/iDm4o0hzuG3jCy4EijYw5nQ7sty 4BSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=+OKtQBoOJbVIYPqME5JS3+HJo+OZmMbCfaB/Qdyp9m4=; b=CwSIT+uQGGBF1FgmkFVHMdI5bGG0TLG7Wieaz//E5GNjkaZGCN6Xq9bw9IFKVcX5Gx /zbBFXdgDLuZ/Ba12SmXMZ66TwmHnRuKof0cDYgAaVaHiOveEJufyAhdGdzKIZBTPSrr RIvK5k5PZIjUvtnWWbTlFy8GwMPJtoEfsqxKy4PP0fdEPHt/fQAZ6UnMHaEUZcPwdDf0 WF/deA389ErKk+C6pQzeiFto4LCaBZDSsTl8Hc3qmqXPk/n1Y74Z+LC0e6HGIdHYxu7E xYH+ZWPYpvdO7/Cp+m25ltSvsCQyabD/qEo7WGZBpwcMfrqOq4CQpayAovU5rVCujxTv Abag== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=eBIxFyNj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c6-20020a056402120600b004ad78f269a7si9183169edw.459.2023.02.23.23.43.37; Thu, 23 Feb 2023 23:43:55 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=eBIxFyNj; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C035168BF89; Fri, 24 Feb 2023 09:43:33 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1B28668C0D7 for ; Fri, 24 Feb 2023 09:43:25 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1677224596; bh=vYeFsEjc3tmtEurwQuipEbX0heEcqK5C5x8criGjPL0=; h=From:To:Cc:Subject:Date; b=eBIxFyNjItCReacuzOVIDfrKuLl4xa746ErQnfdMmNJbFEkVQr70BCNFtHhGkybvf xqoqWmFREDm+itpr6Gjwduc+557PmO9Rfg0GlA41avXoEOj+9WCDVyXMfyDfu/Nrzm TEeJNBH8Vv61GCP0+46ZlP6YFnna7HV1kCKcNE6s= Received: from localhost.localdomain ([113.65.131.200]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id ACF3AC3C; Fri, 24 Feb 2023 15:43:15 +0800 X-QQ-mid: xmsmtpt1677224595tzldyj7k3 Message-ID: X-QQ-XMAILINFO: Mm0jDPjQYDdl3PcT67Y+Inzm0dXP+AUjRNbyDw5K6owwWNsFj79srcjCuAOjFG 0dwEppKLQJZrNny0BLMPMfdYC6F+ux99Ty8GU7YWqC8StRMJ4FG2rRFx8ZzY63Lyr8MOtg6ESjvQ EwYWhO2Zj65PxKJT3Iaz+5iG/yHdZ4bJldKEHuXMKX6etBVdcW2QR7TYAAzbEcFIQtUIUopy/XIn oTZJuuk/IqytjGOeW/RZj/4PnEEunXlQenl769LA1GCoep8OcCsKGMay9nr90hzUm/HYb1c6ET5g eMiLxdup4i+yoWtdhApMf/Sxfghd43jHxPZdiZTmcp/kpnyi3xJEyDuYxDhj9IUpbGqGpD3A+l77 rQwetZut5NyR4UIoVPbOx/OZI9J1G/cWpPOq1O0eGJliYiRBZDwpPKY+R6eby07Ssnk8c1o2gfk5 qEcaddhVSmrWG9lCbZE6mhy/2c8c9rYbSywFUnSCzuXIDmYvPnj2IJknKyFSWR/DPRBfRjPVYxzb aVwqkO15qL24J+BW95Xegr9s5RuajNGL8gSrZ/6M01frhGIt6PAFtSJUq0xZGgDD4pcN3GlQTizn OytKeE0j6/cflTIWwm8N4cScLhZbdi8BpOboazIV8S7rMv6N0XB8FwtnR7xWVwZQiuEma5732Oqx jphq0hB6OigMP3sebXhyocTD29fX/FqyLxKGXVH9e2erxWngwEELWs+7xRrXQfmdXgTnkWtG0QKm SddtRML7/GsT7/QgSTOWGNC1CQ2TsPNN9CRqYYxBiTjFJnoSWQqX8YTfK/0nhyI5xtO8zl9Xj9ui 6kNBsXVG3YMJJj8b3CoQbKiL13HUhDlZtuXS0iQNKXyCZGAfyehFKvoYhuttvXod4NMfLJwMpF5A /ZXITo0GwnfxhhUaziqZvH/hER3WSjfKILwaJakcpeAab7LNylnswKAyVYA0P4F7IeqH3o/fRRyh LMOxzOUPhs07HoYLGobA== From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Fri, 24 Feb 2023 15:43:13 +0800 X-OQ-MSGID: <20230224074313.11631-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] libavcodec/hevc: add hevc idct4x4 neon of aarch64 after fixed X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: N41kWlhP+43H --- libavcodec/aarch64/hevcdsp_idct_neon.S | 40 +++++++++++++++++++++++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 4 +++ 2 files changed, 44 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 124c50998a..f5135160b6 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -245,6 +245,43 @@ function hevc_add_residual_32x32_16_neon, export=0 ret endfunc +.macro tr_4x4 in0, in1, in2, in3, out0, out1, out2, out3, shift + sshll v20.4s, \in0, #6 + sshll v21.4s, \in0, #6 + smull v22.4s, \in1, v4.h[1] + smull v23.4s, \in1, v4.h[3] + smlal v20.4s, \in2, v4.h[0] //e0 + smlsl v21.4s, \in2, v4.h[0] //e1 + smlal v22.4s, \in3, v4.h[3] //o0 + smlsl v23.4s, \in3, v4.h[1] //o1 + + add v24.4s, v20.4s, v22.4s + sub v20.4s, v20.4s, v22.4s + add v22.4s, v21.4s, v23.4s + sub v21.4s, v21.4s, v23.4s + sqrshrn \out0, v24.4s, #\shift + sqrshrn \out3, v20.4s, #\shift + sqrshrn \out1, v22.4s, #\shift + sqrshrn \out2, v21.4s, #\shift +.endm + +.macro idct_4x4 bitdepth +function ff_hevc_idct_4x4_\bitdepth\()_neon, export=1 + ld1 {v0.4h-v3.4h}, [x0] + + movrel x1, trans + ld1 {v4.4h}, [x1] + + tr_4x4 v0.4h, v1.4h, v2.4h, v3.4h, v16.4h, v17.4h, v18.4h, v19.4h, 7 + transpose_4x8H v16, v17, v18, v19, v26, v27, v28, v29 + + tr_4x4 v16.4h, v17.4h, v18.4h, v19.4h, v0.4h, v1.4h, v2.4h, v3.4h, 20 - \bitdepth + transpose_4x8H v0, v1, v2, v3, v26, v27, v28, v29 + st1 {v0.4h-v3.4h}, [x0] + ret +endfunc +.endm + .macro sum_sub out, in, c, op, p .ifc \op, + smlal\p \out, \in, \c @@ -578,6 +615,9 @@ function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 endfunc .endm +idct_4x4 8 +idct_4x4 10 + idct_8x8 8 idct_8x8 10 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 88a797f393..1deefca0a2 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -49,6 +49,8 @@ void ff_hevc_add_residual_32x32_10_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); void ff_hevc_add_residual_32x32_12_neon(uint8_t *_dst, const int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_idct_4x4_8_neon(int16_t *coeffs, int col_limit); +void ff_hevc_idct_4x4_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_8_neon(int16_t *coeffs, int col_limit); @@ -119,6 +121,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_8_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_8_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_8_neon; + c->idct[0] = ff_hevc_idct_4x4_8_neon; c->idct[1] = ff_hevc_idct_8x8_8_neon; c->idct[2] = ff_hevc_idct_16x16_8_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_neon; @@ -168,6 +171,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_10_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_10_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_10_neon; + c->idct[0] = ff_hevc_idct_4x4_10_neon; c->idct[1] = ff_hevc_idct_8x8_10_neon; c->idct[2] = ff_hevc_idct_16x16_10_neon; c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_neon;