From patchwork Sat Mar 11 03:18:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5b6Q56aP6ZqG?= <839789740@qq.com> X-Patchwork-Id: 40645 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:d046:b0:cd:afd7:272c with SMTP id hv6csp81271pzb; Fri, 10 Mar 2023 19:18:22 -0800 (PST) X-Google-Smtp-Source: AK7set9YayjxBZ7mW3ZXmNT+SeKtDxMIhA4v3CbwFgH4VBrndw3mdv8VyEzWkXvTCQOHYfejzOtY X-Received: by 2002:a17:907:c61b:b0:878:4e5a:18b8 with SMTP id ud27-20020a170907c61b00b008784e5a18b8mr30099817ejc.66.1678504702638; Fri, 10 Mar 2023 19:18:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678504702; cv=none; d=google.com; s=arc-20160816; b=eVd9yeLLqNZ6Gz7u8SJZfiEZXHOmTJIyF1TtrBCGkm6/GSObQyO+4U1Mh/YyYzcck9 xp24tRt8dIZzCqz6gNq7Gect5jCeGoBEcKFJMW2YVMsOVCmWvj3qSgwI2o74mPummVIV 3oVBriOJMDfRxM3KVW/qK+QcI0kmwJQGjyauyCJRkDihXTmPRPqBdGtjvQracwVYRpCc A38ih3Qex6YZef7zBoclhq4z6P4908wK5rwJjz933FLbuDvOoeZaCLdv3tQdy8Evhjd8 m2n8ZYAba/IuL/N2002TnJp8hLNzfFU30XhBG6SHHodsdZ+cOok9gZMbUIcxlW/qAzgh yTKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=mFCg1UsqcC0VBFjA4qFbysEITIjUhUV9Vwo6euepQCI=; b=ZAeI/O9yqvHGIrkvQdpsQ8/U8NnRkylJzCYDOvOXkqmYKzZXaoEWIgpgxjnUN8aSqB S6WSW0vaCeD3NL4g/hnmz8OMS6uvLp+Lc5/gjXj8IWSY3RALP3lBFA7ydP3u0vdNmqqS s2Ltf8IPVUTIHQTPGIl7VbET86ROVyUhiQdVrS3SpC2murVyregfJISy1tipgOgyx8Zj eJ2tXB586Mo+0vk8KH1/JDnriMXH0bbuwHHAt01TqeVTBuwAcWvZcjtYbQb3CactNjUi lzEo06h7mVRrA/fmprYEDu7CQAa6WX08cqMQ1UTqtf4Kjqk8Enx2ukE1f8LtqdET9ZtE vaZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=RHcTDWP4; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s21-20020a170906061500b008eb27de200esi1297635ejb.278.2023.03.10.19.18.21; Fri, 10 Mar 2023 19:18:22 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@qq.com header.s=s201512 header.b=RHcTDWP4; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=qq.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 60DFC68BBF7; Sat, 11 Mar 2023 05:18:17 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-251-59.mail.qq.com (out203-205-251-59.mail.qq.com [203.205.251.59]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CC30C68A14E for ; Sat, 11 Mar 2023 05:18:09 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1678504684; bh=vJAEdlIqokf/BIugMdfriKd0lFWa5i1UAv5c/xtue0Q=; h=From:To:Cc:Subject:Date; b=RHcTDWP4BMJJKOTs4+GCUrd1lT97ByPZa1VvvKa4ZlBgiqYTig8qv9Io98E/wv3ko vP8SCO6dIldrLJpLSJiwcLkVWp4fJ4sDA4gwHhQmPrVlDoXYl6xpZ0gaFFzps7BdEy CVBvPKDUgnUbCF8hefgy/EBCUV0CNjepupgW/gzo= Received: from localhost.localdomain ([113.66.219.52]) by newxmesmtplogicsvrszc2-0.qq.com (NewEsmtp) with SMTP id 48332864; Sat, 11 Mar 2023 11:18:03 +0800 X-QQ-mid: xmsmtpt1678504683tikc7kvwh Message-ID: X-QQ-XMAILINFO: OKKHiI6c9SH3roEAEEZR6J5YEzxbT/dpb7Xse9I/L728IbCNq7e0EwUBBZPQtk 4vOoCxOnYjlJvlPTlvDsqwQHiLT4Y2PHe88gkq7udoMBf6we63E6NmVv3CEVOfw8wIfjiYI84QOx jOZpR6wg6H7hrzZ6jRU4E3MmG9L4qf168d7d370+hAx/SmjZ6P/LZmmAWTWTVorAIEb1H4ndygHJ l6849uH1bv2E7TTJQWJmCN/4NMjUZrOjpqKNwDVyLINyHRNcIKLE8klVgi/3viaa6Oqls/nARFOP rvFjlbHUy9CZXGKdwvRnK53IAAEHL6ADCCrhP4tAQzgS12QVbrm0nvr446acQHqkwEgVm9x79q0E c4prnQRCeSny6/9if9dbkI8u5hCnPjyPtc49EHjWNN7dW1AiqEX42Fi/uLXP7vzm8GFzo+2LEdeN ciIM2FQkXyE4yORhdbRJTJDkbJhtPp1A7WkzsnOsAfNvP47P8o37fUEQTOuYWCiGNbCa6eMPmvFs 7414OY75aYae9sIhjA72cGuFZ2B3kcoZXnmTMAPobz2g6Hyn4hJWMD/dUEFTN6rdl3eIt9ffdTGM IIPltuuN9p2+Fluk19pzztrPn9UTbivrbmd3JjRkqO6a8fi+U886jiiaFJaV2+suTargQAHBcH7e YS2/HNmcAn7SHpYakMZ3wZbp2KEMRAkyPjT22Yhiji9VqmHJkc6ICW12ncGkmdOhJ9cplxrXkTAt 05brRcFeAhCNt9QP8dpnzgI2ISnBTR14ZzQnRt85CMCkYdZf7CLq0csmHINuAPw0gvA8jiw4HmOR sDG1cXSbbrmOM5ZfdxpTp68fKAzssN3GA9j6XA45XIRaeC6xNbQdvOedpaC0NSovZiRFCJPlD3y/ KgrbfUzHoMOOgL25ULvXu+23uIW2EDvEJz3zw1IDn6d0AekEcCeJSLKG7+Z8o/ZJ7/7sRX3URzvj RRRMb9uNm4Be9EKusdy5fLwTvZEQQB From: xufuji456 <839789740@qq.com> To: ffmpeg-devel@ffmpeg.org Date: Sat, 11 Mar 2023 11:18:00 +0800 X-OQ-MSGID: <20230311031800.26784-1-839789740@qq.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avcodec/aarch64/hevc: add transform_luma_4x4_neon note: run_count=1000, CPU=Cortex A53 transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: xufuji456 <839789740@qq.com> Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: Rlxvb3bWEzF8 --- libavcodec/aarch64/hevcdsp_idct_neon.S | 52 ++++++++++++++++++++++- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 + 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index b11f56862a..00d9690466 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -7,6 +7,8 @@ * Copyright (c) 2020 Reimar Döffinger * Copyright (c) 2020 J. Dekker * + * Copyright (c) 2023 xu fulong + * * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -665,4 +667,52 @@ idct_dc 16, 8 idct_dc 16, 10 idct_dc 32, 8 -idct_dc 32, 10 \ No newline at end of file +idct_dc 32, 10 + +.macro tr4_luma_shift r0, r1, r2, r3, shift + saddl v0.4s, \r0, \r2 // c0 = src0 + src2 + saddl v1.4s, \r2, \r3 // c1 = src2 + src3 + ssubl v2.4s, \r0, \r3 // c2 = src0 - src3 + smull v3.4s, \r1, v21.4h // c3 = 74 * src1 + + saddl v7.4s, \r0, \r3 // src0 + src3 + ssubw v7.4s, v7.4s, \r2 // src0 - src2 + src3 + mul v7.4s, v7.4s, v18.4s // dst2 = 74 * (src0 - src2 + src3) + + mul v5.4s, v0.4s, v19.4s // 29 * c0 + mul v6.4s, v1.4s, v20.4s // 55 * c1 + add v5.4s, v5.4s, v6.4s // 29 * c0 + 55 * c1 + add v5.4s, v5.4s, v3.4s // dst0 = 29 * c0 + 55 * c1 + c3 + + mul v1.4s, v1.4s, v19.4s // 29 * c1 + mul v6.4s, v2.4s, v20.4s // 55 * c2 + sub v6.4s, v6.4s, v1.4s // 55 * c2 - 29 * c1 + add v6.4s, v6.4s, v3.4s // dst1 = 55 * c2 - 29 * c1 + c3 + + mul v0.4s, v0.4s, v20.4s // 55 * c0 + mul v2.4s, v2.4s, v19.4s // 29 * c2 + add v0.4s, v0.4s, v2.4s // 55 * c0 + 29 * c2 + sub v0.4s, v0.4s, v3.4s // dst3 = 55 * c0 + 29 * c2 - c3 + + sqrshrn \r0, v5.4s, \shift + sqrshrn \r1, v6.4s, \shift + sqrshrn \r2, v7.4s, \shift + sqrshrn \r3, v0.4s, \shift +.endm + +function ff_hevc_transform_luma_4x4_neon_8, export=1 + ld1 {v28.4h-v31.4h}, [x0] + movi v18.4s, #74 + movi v19.4s, #29 + movi v20.4s, #55 + movi v21.4h, #74 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #7 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + tr4_luma_shift v28.4h, v29.4h, v30.4h, v31.4h, #12 + transpose_4x4H v28, v29, v30, v31, v22, v23, v24, v25 + + st1 {v28.4h-v31.4h}, [x0] + ret +endfunc \ No newline at end of file diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 1deefca0a2..10e7f2318e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -63,6 +63,7 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, @@ -128,6 +129,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; + c->transform_4x4_luma = ff_hevc_transform_luma_4x4_neon_8; c->sao_band_filter[0] = c->sao_band_filter[1] = c->sao_band_filter[2] =