From patchwork Sat Apr 8 08:59:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 41026 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4645:b0:e3:3194:9d20 with SMTP id eb5csp898946pzb; Sat, 8 Apr 2023 01:59:39 -0700 (PDT) X-Google-Smtp-Source: AKy350YcnNsOYqRKNlcOSJwnofRni3TpDFaaF228ZzJ2q3mLWzKcdciqVlnqMJG0xymNkEPUTv1W X-Received: by 2002:a05:6402:35d2:b0:4fd:21a6:832d with SMTP id z18-20020a05640235d200b004fd21a6832dmr10492167edc.11.1680944379299; Sat, 08 Apr 2023 01:59:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680944379; cv=none; d=google.com; s=arc-20160816; b=XpOfT3VCiPxKkbOl/UTBGqKRuYW18HOqwsFqhhNDfdC8UHCsFxXhXfEm6PwqN2DILR k4WVQ110sxElreRKHKPZ1qJJBzVUDgc8jqg/gr/X/R7Sb9VC32cfmxmFnCQ/WkEH4Jqo utFu9MVbb7Y2GPxs/LNt97mdYl8NzGqbl9tDlf+JmV0f9t4HgWee0X3GryztjiB9Mfa3 Xv85YZfd/acGA0mjcLuRE5kovFpNMPAh+8cao4SiUWVzSsgb6/XD3QAi2kLlOu52tbEp Qtvkpv0UNvARZYQ0NkVQ8YO8LLC/8lj9gBEUbsgPUJfuExjhn24nhvz0oAqTjZ4LaZTC sQJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=QG1ttl1e4AdzVxKBVeTsCW1bvppmBrU5aGaE3Kgx3wQ=; b=vjD66dv6Sgn98qRr+vyNGkDJJvYaquI7g0eMoSc0WlEs4rPjaVspi4rDObWW3P6WYM p7m4yjZVd0McU+fcsJP29FmqK6iFK3ooIRsrB0OFYLcl/fZg/AVG/GAJ5BTwzEllwwza oqkmkMYsM9wqy8Zvv+I1OvK8+Cg/jmtmm7wpTZXa1J15xYjNB6BqHZ5AJISdwkt2oVZ7 dmpCNmfqQhdqNx7p2Nd5DoDJOlWg5ziv0kH4FIXpfnUkk9cjocnSkVsJ3id/+Pjjj+Wq 7iADLuniQSNjPvhMPike/N0NkRLWkuSHf9aYjuOnZYrvo1pdtTnOqLObkCmFJm9Lsw/t JtFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=ADzUSWkw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id f19-20020a17090631d300b0093397c2c553si4792603ejf.289.2023.04.08.01.59.38; Sat, 08 Apr 2023 01:59:39 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=ADzUSWkw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4931568B75F; Sat, 8 Apr 2023 11:59:34 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-58-211.mail.qq.com (out162-62-58-211.mail.qq.com [162.62.58.211]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 62F4F680785 for ; Sat, 8 Apr 2023 11:59:27 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1680944362; bh=jBW9wZFMTlJ3W+4wXZlIuZfmwK4Ig4XQ/R6GSzaw+fA=; h=From:To:Cc:Subject:Date; b=ADzUSWkwnakxSLO3E4xmG3GYFwckWz0K8AA0kMbCWheboHgZTenO4asDo3HF9XVuS gapa1O51rzk5LpdZewU9qC76vjJTooAkzcww1gaVCjL/bTht16uwAb7NexbbiXSsae iE+a4n5BG9tZA2n9/+YCiCmYboRtpfabnLTyVZlA= Received: from localhost.localdomain ([113.66.216.81]) by newxmesmtplogicsvrszb6-0.qq.com (NewEsmtp) with SMTP id ED50C243; Sat, 08 Apr 2023 16:59:21 +0800 X-QQ-mid: xmsmtpt1680944361tgh0qib85 Message-ID: X-QQ-XMAILINFO: Mzcurg9uYAemQhoCxrTG76OKKTs6EGpph3P19w/8ntGabcV2H3saUJULM6f8eZ TIpqjre1bZYAgJjSxuYwp60EAZ+AhmAmIiavwwqmzEJFc9yw6Exd0OliW8JYqnlxcdCuM48wO0St C7b85iunMFfx4T/EEKwUeOT+nfq/CjisxMRSoniDJZKMDsHfyXQP47gOPbpOJiCGRslwbUm5juDn QWOnkf59768NLRnXJYWXDRoPRHjuwC1Q/Lx1UPaq+ARl61MyOTKXMtHSIG0IInW8rkXVfLyHXWHW rbhadqnuio6dDu74RPAj2GoL2qGyPO8THtmwWg3E/L3SIm0+pYm04Ag+qLacYbODDHDgopmTto0v +ZEEncfYmHYLuIIhE1vOTddou4YkR4Kgtpk/YnJbHv7/u5DvObrI+Hw/tU90MlWSDVSmTRGYEeSu WUo7A/K/87/AsFabXwbhWZv30DduVBuXKKzmBBW5GqmpBv7pYJJ/GXvgYhsGEaXcr15D2HKl9kgO +zjoUNfORdsCCkQ2B8njMxvGMMD/RvXkwxAxM+Q27nGy1OXRIsLWU45evxHDnbhnVKJuKAOC1DlC L1gOHj0UoKbkqL0QXil3wy/Zu6d0y4A0epbi3FlPfP1SbfNVUhQAFkmEY3Avm9pU1Sc7EtQONE54 92NG197KuaM7gbh2oSTxKMkbzsWsm/Q7570oHlbGQqvgT/saw9kvl2NvdgR2wqs+W9R9mVIND4NF xQ5XQw/IUpL1FlcCdkolQUdEKIdeoCO9YBgQSHBriE8z5n6g6METD3isuJj9wam2ME5fVPG4PNvF bw1AoCqL3NmJTXAF3yrfeyCdgtwHsDuYnxXA8zmI0RPpuWItDlHhuWA/ppa3jWh9igic8oxacCfy X05Ww7l+oK2ZHrQYgryrs7A64+WMULopfz2k1GHRYpMQwJKoiy1RQyxQ8x2ygOaFvfX41YtQR/wx EQNmG7MlgI8nuhGyi6ME2pw02S6beAk00CRUJDlJ1IW7pTlf1hlOiI8jIJlxPu From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Sat, 8 Apr 2023 16:59:18 +0800 X-OQ-MSGID: <20230408085918.46746-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] codec/hevc:add transform_luma_neon and checkasm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: DtPuKVivr5bT got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 --- libavcodec/aarch64/hevcdsp_idct_neon.S | 51 ++++++++++++++++++++++- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + tests/checkasm/hevc_idct.c | 28 +++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 74a96957bf..f302ed9773 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -6,6 +6,7 @@ * Ported from arm/hevcdsp_idct_neon.S by * Copyright (c) 2020 Reimar Döffinger * Copyright (c) 2023 J. Dekker + * Copyright (c) 2023 xu fulong <839789740@qq.com> * * This file is part of FFmpeg. * @@ -656,4 +657,52 @@ idct_dc 16, 8 idct_dc 16, 10 idct_dc 32, 8 -idct_dc 32, 10 \ No newline at end of file +idct_dc 32, 10 + +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc \ No newline at end of file diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 1deefca0a2..10e7f2318e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -63,6 +63,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -128,6 +129,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] = diff --git a/tests/checkasm/hevc_idct.c b/tests/checkasm/hevc_idct.c index 338b8a23e4..1d246364ca 100644 --- a/tests/checkasm/hevc_idct.c +++ b/tests/checkasm/hevc_idct.c @@ -84,6 +84,27 @@ static void check_idct_dc(HEVCDSPContext h, int bit_depth) } } +static void check_transform_luma(HEVCDSPContext h) +{ + LOCAL_ALIGNED(32, int16_t, coeffs0, [32 * 32]); + LOCAL_ALIGNED(32, int16_t, coeffs1, [32 * 32]); + + int block_size = 4; + int size = block_size * block_size; + declare_func_emms(AV_CPU_FLAG_MMXEXT, void, int16_t *coeffs); + + randomize_buffers(coeffs0, size); + memcpy(coeffs1, coeffs0, sizeof(*coeffs0) * size); + + if (check_func(h.transform_4x4_luma, "transform_luma_%dx%d", block_size, block_size)) { + call_ref(coeffs0); + call_new(coeffs1); + if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size)) + fail(); + bench_new(coeffs1); + } +} + void checkasm_check_hevc_idct(void) { int bit_depth; @@ -103,4 +124,11 @@ void checkasm_check_hevc_idct(void) check_idct(h, bit_depth); } report("idct"); + + bit_depth = 8; + HEVCDSPContext h; + + ff_hevc_dsp_init(&h, bit_depth); + check_transform_luma(h); + report("transform_luma"); }