From patchwork Sun Apr 9 03:52:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 41030 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4645:b0:e3:3194:9d20 with SMTP id eb5csp1352521pzb; Sat, 8 Apr 2023 20:53:05 -0700 (PDT) X-Google-Smtp-Source: AKy350aeQ32TLmX5IrlGtenqI3wEJA+bHJfD9l/w9i5o7QhLw9nieu9eAeFK/MrGYOzCHsHhhA1B X-Received: by 2002:a17:906:c048:b0:8aa:33c4:87d5 with SMTP id bm8-20020a170906c04800b008aa33c487d5mr3863361ejb.10.1681012384944; Sat, 08 Apr 2023 20:53:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681012384; cv=none; d=google.com; s=arc-20160816; b=L6u5BNuggnLvs+v5xjNO4b7apBv6YSoXGajJSeQ5QArGF1PEY0iaYIiJRk/i7SoLJG GFAsUI7uapX70zAU7Blw6S1A2j+7ka9MdeLn/bYuDD/Q1iPZ2ND2aTGCPDSHXhUggowQ dft7pIBt3hdWwxl/nEPMJE0FwnjSl9LdqOKFwnvRe4Iw6+uZLD4BSQ/1XML9QzVPk/g1 l/AkaPO5L9h59BlIOdZusjYfdisDACyb5GsJHjfu7lGAGf80Jb01qopqkR707RQLwFYy zSZkxeL/gDzBTHAQrQSCQm408a3QxjZ1qBnDWT7DLSB5FfmqahiLlzfu4RNan+sfnfpD HAzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=V9dIlvd8wIMc5lg/P9OEOgDm9hp4MnMjCFZs6WbJ73s=; b=f7slioFLZCGA00Ucd3S+BuY2PRhvEs0FYA0iNTTFPPl2GDXUwtDQVdSm6aVFSg5ULz qqkMrRw5G8LhkEQsl1Khn7F2KFJtFq8SuRyNW3OJgxK7VFMZRiwEq+NPkerib8xPZByD bAXAVDIZsA8Ege+QeWS6zb48TIK5G3d2hD90K2t4lqNeIYG29BUhwE/dzMMgPVCp4ILT tUO0hlQARUiFWb+2sKRnxxZdP0E9RwgDpC4Ht7F1ls6nABIgE7UE0tS7PFHchkag1OGS A1pTccOsT8GjQvCB3scnRRxNL2jrt6CMsBtrdK7h0wecuo9JI8dWPyD1bf9y92HaGGV8 oMBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=Ju5gh0ao; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id og49-20020a1709071df100b0093054f9acd9si3783993ejc.215.2023.04.08.20.53.04; Sat, 08 Apr 2023 20:53:04 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=Ju5gh0ao; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C843E68BA7B; Sun, 9 Apr 2023 06:52:59 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-252.mail.qq.com (out162-62-57-252.mail.qq.com [162.62.57.252]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0487868AC05 for ; Sun, 9 Apr 2023 06:52:51 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1681012367; bh=d+W8ygJBIR4IiTSna1CXGciW997evi0euzMhV9bVYsk=; h=From:To:Cc:Subject:Date; b=Ju5gh0aoL64PBfE8yj4Enq4/cGfMSshl+pPR3v30IKtAsoIZBgiWg/skOQs153gHG bG5TI74jgccw8jqBrIGuyGjr9gIrQxkNZusMwdc+UQWV86BYshvEj5kWhQUeIjHBII izFiD+A+agY8yXnQyNZloXSI9TIGEfMn7fVpc+tY= Received: from localhost.localdomain ([113.66.216.81]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id D2E37E65; Sun, 09 Apr 2023 11:52:46 +0800 X-QQ-mid: xmsmtpt1681012366tlkysbyx8 Message-ID: X-QQ-XMAILINFO: MyirvGjpKb1jPI6gQCKi/sLox2PObkmeMzRncTRnigH9Y2rkVVB2yjO62/meoS GTckTIgFBjBG/KX4RVQu03YKVeZIaFGLzAhyzt4upf7WmeCKb+e0OFKDJR4yPJXVnT3Ffh2YelMv iOjFNlHScn5fIk56lZWX7q4desES/3ca5p100LjF/BtH7uqJf7IknTHSfp3/QPLBEF43qgI+6C2v 2HPz/RFxXfi/b5e/nNDlKAwDMSqrTI4+rObB97WeP9SXa6OfXoPvNwl1xAKaTW9hfqKIC1kh2qMB K6ouFOAx0LE+nXQfqYXg9N4j+0fQbyqfnmTzybi2GYpcD2LablKcnTlqATzegXcLJnOpeElVO7Gn F6CjP+I+UQh7yajFU3kv6OTEDKFN0zHU0wDb299H0bfgC8MdoQ2qrB1EEaLnbzWoLoOF0V9nMUyV QKozWnZhHxMfX1FwkdSq38t/01i4SnahmX9kqua8wuPLHAI9fONyEZVepgdKGbqEO5FlXaqFNAcB iptm6isPHr5TxRAnCnIl+Cm0GpwgV2bA3P0GPEUzpH1ikmdZa2fgN6Pi3M4yJOCf8fS8G7cS4I2n J+mMvz+nmpAGBkGR9QPoC88xqTOhlF658nlQq+PmSsdTBDBnCtP1rBilHYJlh8groxZpjVgNJLSO Oi3PV3OUJfkxzWV8wBUy59tY/oNtkw+mn84MTuOLQUKcY/lG5Us4HYjthZKLK5gfVLRf4jDnRd9z 6PcRIcEWzjUb8gyRcGrMfLO/znESx9zSoYFs/PT1Ydm9pIwtZnjRqOtbb/csBrBmZuoJNIJ5i+o1 ce5Bj6Pgp8zkKwiFeO06L5VZIlhibgMRyPqpeVAKM2huRzLWirW3q8Yyh0WM7tlRYJyoVA11N0lE 46weJr+rHye50oZujAcvWEkjApzIPsYZ1hbY8p+8cqgb5hL3mOdxGC0auByMHL/7TbeaBqZxS95n Rb0QCs1VrKcCZVw5nR8uIa9daxE9HDHupBhhN2yXhJXYwXsvcRCg== From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Sun, 9 Apr 2023 11:52:44 +0800 X-OQ-MSGID: <20230409035244.55649-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] codec/aarch64/hevc:add transform_luma_neon and checkasm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ipGwKK5257hr got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 --- libavcodec/aarch64/hevcdsp_idct_neon.S | 51 ++++++++++++++++++++++- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + tests/checkasm/hevc_idct.c | 28 +++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 74a96957bf..f302ed9773 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -6,6 +6,7 @@ * Ported from arm/hevcdsp_idct_neon.S by * Copyright (c) 2020 Reimar Döffinger * Copyright (c) 2023 J. Dekker + * Copyright (c) 2023 xu fulong <839789740@qq.com> * * This file is part of FFmpeg. * @@ -656,4 +657,52 @@ idct_dc 16, 8 idct_dc 16, 10 idct_dc 32, 8 -idct_dc 32, 10 \ No newline at end of file +idct_dc 32, 10 + +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc \ No newline at end of file diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index a923bae35c..6605a39973 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -75,6 +75,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -142,6 +143,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] = diff --git a/tests/checkasm/hevc_idct.c b/tests/checkasm/hevc_idct.c index 338b8a23e4..1c2b08d0f8 100644 --- a/tests/checkasm/hevc_idct.c +++ b/tests/checkasm/hevc_idct.c @@ -84,6 +84,27 @@ static void check_idct_dc(HEVCDSPContext h, int bit_depth) } } +static void check_transform_luma(HEVCDSPContext h) +{ + LOCAL_ALIGNED(32, int16_t, coeffs0, [32 * 32]); + LOCAL_ALIGNED(32, int16_t, coeffs1, [32 * 32]); + + int block_size = 4; + int size = block_size * block_size; + declare_func_emms(AV_CPU_FLAG_MMXEXT, void, int16_t *coeffs); + + randomize_buffers(coeffs0, size); + memcpy(coeffs1, coeffs0, sizeof(*coeffs0) * size); + + if (check_func(h.transform_4x4_luma, "hevc_transform_4x4_luma")) { + call_ref(coeffs0); + call_new(coeffs1); + if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size)) + fail(); + bench_new(coeffs1); + } +} + void checkasm_check_hevc_idct(void) { int bit_depth; @@ -103,4 +124,11 @@ void checkasm_check_hevc_idct(void) check_idct(h, bit_depth); } report("idct"); + + bit_depth = 8; + HEVCDSPContext h; + + ff_hevc_dsp_init(&h, bit_depth); + check_transform_luma(h); + report("transform_luma"); }