From patchwork Sat Sep 7 17:13:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51385 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:9c4f:0:b0:48e:c0f8:d0de with SMTP id w15csp833507vqu; Sat, 7 Sep 2024 10:14:34 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXGnruPVd9x1z9fDnQrKotWDhsCX5x9Cg0t2z0qtTBdmJJ+CXNSuWVb5HH5YvNzWGSJS1Gjvbtx2k2oL+J2FCW4@gmail.com X-Google-Smtp-Source: AGHT+IEv+cYzTioyOCH3wOWZtwTVE9kYppasUMLQH1J0c1oCdQTBqo3QIMZZtlb28pLOwlBwI96j X-Received: by 2002:a05:6402:40c9:b0:5c3:c42e:d60e with SMTP id 4fb4d7f45d1cf-5c3dc61cfb4mr1640728a12.0.1725729273903; Sat, 07 Sep 2024 10:14:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725729273; cv=none; d=google.com; s=arc-20240605; b=b4NN5lvm506/ILJ99FBUbHfc0JjB7UN67XfMhMlbI5MVxeZ2jYKU/LnNyCvHarzN6R McAh5vD0vaENTuMjmMF6YG+jAt0mE2IuyskJJPbmy8MEyMjdDfwOogyxp5OFr6Pm/4DE TVdx7Iq4woYwmM4C4Ops7k9VJeVkqRQxBK8eJh/KMNRGddeLRzDfdCZBRL5gnudV4ZOl qpUH90tp06sLebawAoMnj9raQWqvU5lRNZQfx247mbnIggGXriDJFiH/ad/0hCbHrB8e ORSF7MORlemK/JC0Sj2Xj7JsOfYS2YiFB0mIekXDtiUu+P0djam0rxIu7si7F11tYLAh ezdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=DniQkQl17ynLnH8XStUq5QY2Q5GhlRCLB7gypWT7uBE=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=Q1gnlfh0GGDNNWpeyGlUoJdBqtOv/vWoQcHBXG822jtBMXeJxfR8UBaZv8KEPda5VQ 5iDqDYn6/XSXmXZJ+F3CDNDyhS5eROCletmyfIL+qB5Ynktdgt0TTmbf6zB9WVDEkRle SGETk6XKsvjm2uhcZciZJjAE0uooyted/t/wsNZXsDI5XTlz0HHkd0a3hNZoB/sS4vOb pWthlkhGsu9WJqnZbyJJIsoiIanRKiS6vuM0osvNjRNQGVsUjQm17gAA0mQMTiX0FPbX xxECZpWPi/xmKh32uYgY0qHU6/dm2RKo51i6r0ODcGLV/2QGqQYD89qYn7Ly3MFSdEGM 9WnA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQRqwVj9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd79eb4si1048453a12.255.2024.09.07.10.14.33; Sat, 07 Sep 2024 10:14:33 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hQRqwVj9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 53CAF68D92B; Sat, 7 Sep 2024 20:14:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-240.mail.qq.com (out203-205-221-240.mail.qq.com [203.205.221.240]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1B3FD68D7A9 for ; Sat, 7 Sep 2024 20:13:53 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1725729225; bh=BPuSZp86SXqDtkvFLR3vcL4wx5As692lScVR7k4lZl4=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hQRqwVj9cIqYslMpUG0Zz0erjw+XyXJWjziNQzwhsg6N9gRtbI7gffmAT8QH54xkJ UxbX/NkMgZ4/ywQSzC8320RtooxLIOrvRoDsMu28ROBkerf2l1odfP8mrZ5tZ42a11 bTmaF8/sV+d/Zt1rJBjD741U2XizDHC+VQpooXc4= Received: from ZHILIZHAO-MB1.tencent.com ([121.34.200.168]) by newxmesmtplogicsvrszc5-2.qq.com (NewEsmtp) with SMTP id 36A3CCF4; Sun, 08 Sep 2024 01:13:42 +0800 X-QQ-mid: xmsmtpt1725729224t6n699snv Message-ID: X-QQ-XMAILINFO: MmPNY57tR1Xnz9xIJ2y+iWjz4nuB1hlFpl+nktAlui9vXyZpFE0kPngu/34kV0 zwh3SksLpgMdZZKljJ6UGttDjFy9EhmtUufYjJaCcBdBWkInMXt3x5tBA4qE/JbxDrDBifOZDvwq tqwvmolCaMWjf/kKHIXPPJAUJbQIL7KKtYoDL70klowudjfG4e2U1BiIu/Q3hBxU3yitMMhRSj6P 5+jbDx+Yf3BEO0JUggrJTeXF77vZsX1hR5wWkHiqdz7DK+kML8/DAwweksSe91ZFlhyCLm3yhYzj J8r+e9lQGt5jYE2aqf0cHMpcwWQrlSd/6Y/nK4ZxQ66eV2Y4USOgEYdyC8/fpwCvSEgEy8mTvp8d XZ+U/D7kwghwPaciIv5x/DnTvMAneBghJ/RGNWx+gUDn0/Qk9xFj5ohzS0RYlHd+Cd3M9LXlU5y1 jlzy7QUpo287NjjzWobfFHNLCdlcYts7gbjEbsxCMr0S7KbaA4KvpcaDtw27HdL2laPWPAYmEzNf l0c09AEZ2eaOffpAR2jLXSSuc3GOQl/Y6TJ892Kvpb7INKeTggnBuULC8o9oUvAmqQ41qhlwAVT5 tpy3qUfufKaQjws9LjIN0m1mBXJ7TekRRjrK1Ga1P9r6xekcwD1P61cDYRSasDLPz1Fr2RlpgwYv k+S8MsjJWNc25+ahL1a2fNAyO3djcW8HqlfoFsedwRxLjJY0hUAj9KTnGIHv+zNPDp6cDu1nEAAg VCE53y7Dx5BYtLTp4aLdzVKNNeqwJtYk3OA0GF6HNqGedsPgnjToNet+4KV8iRUhEZu4Ti0jHa7S o1iuPH9LnS8EVfo6jW4aT4tyrvc0EANZHCqBMJz4W30KaSgBTwtXMWzsw8Md7ekXEEHcPjxEiT9j ZuBOxBGjm73x+cgYhjNCCzroDgc4euKWTstGSCoRfD+SKFIreauZgJ6ova9FDcFYveqHqFAbxwox H4fyg3Zr81iZmyD8Gh3+yga8xxHOzc+mhOKG1TK+z6sqcX8CeQlEXcBl7LUPXV7McqzFiNR2PHPH DFGv0N+joSEJqkzP/tuXB61k1Y7X8= X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sun, 8 Sep 2024 01:13:39 +0800 X-OQ-MSGID: <20240907171340.55502-5-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240907171340.55502-1-quinkblack@foxmail.com> References: <20240907171340.55502-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/6] aarch64/vvc: Add put_qpel_hx i8mm X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: nyTsRtzycaoD From: Zhao Zhili Benchmark on Android pixel 8 with -fno-vectorize put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x) put_luma_h_8_8x8_c: 1.5 ( 1.00x) put_luma_h_8_8x8_neon: 0.5 ( 3.00x) put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x) put_luma_h_8_16x16_c: 6.2 ( 1.00x) put_luma_h_8_16x16_neon: 2.0 ( 3.12x) put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x) put_luma_h_8_32x32_c: 25.5 ( 1.00x) put_luma_h_8_32x32_neon: 9.0 ( 2.83x) put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x) put_luma_h_8_64x64_c: 99.8 ( 1.00x) put_luma_h_8_64x64_neon: 35.2 ( 2.83x) put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x) put_luma_h_8_128x128_c: 422.0 ( 1.00x) put_luma_h_8_128x128_neon: 138.5 ( 3.05x) put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x) --- libavcodec/aarch64/h26x/dsp.h | 4 ++ libavcodec/aarch64/h26x/qpel_neon.S | 68 ++++++++++++++++++++++++++--- libavcodec/aarch64/vvc/dsp_init.c | 9 ++++ 3 files changed, 76 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 076d01b477..323a253257 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -270,4 +270,8 @@ NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, const int8_t *hf, const int8_t *vf, int width),); +NEON8_FNPROTO_PARTIAL_6(qpel_h, (int16_t * dst, + const uint8_t *_src, ptrdiff_t _srcstride, int height, + const int8_t *hf, const int8_t *vf, int width), _i8mm); + #endif diff --git a/libavcodec/aarch64/h26x/qpel_neon.S b/libavcodec/aarch64/h26x/qpel_neon.S index 0585f03de9..8a372a76be 100644 --- a/libavcodec/aarch64/h26x/qpel_neon.S +++ b/libavcodec/aarch64/h26x/qpel_neon.S @@ -3518,6 +3518,17 @@ endfunc sub x1, x1, #3 .endm +.macro VVC_QPEL_H_HEADER + ld1r {v31.2d}, [x4] + sub x1, x1, #3 +.endm + +function ff_vvc_put_qpel_h4_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h4_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3574,6 +3585,12 @@ function ff_hevc_put_hevc_qpel_h6_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h8_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h8_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3658,6 +3675,12 @@ function ff_hevc_put_hevc_qpel_h12_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h16_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h16_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3748,6 +3771,13 @@ function ff_hevc_put_hevc_qpel_h24_8_neon_i8mm, export=1 ret endfunc +function ff_vvc_put_qpel_h32_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #VVC_MAX_PB_SIZE * 2 + add x15, x0, #32 + b 1f +endfunc + function ff_hevc_put_hevc_qpel_h32_8_neon_i8mm, export=1 QPEL_H_HEADER mov x10, #HEVC_MAX_PB_SIZE * 2 @@ -3883,10 +3913,7 @@ function ff_hevc_put_hevc_qpel_h48_8_neon_i8mm, export=1 ret endfunc -function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 - QPEL_H_HEADER - sub x2, x2, #64 -1: +.macro put_qpel_h64_8_neon_i8mm ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x1], #64 ext v1.16b, v16.16b, v17.16b, #1 ext v2.16b, v16.16b, v17.16b, #2 @@ -3977,11 +4004,42 @@ function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 sqxtn2 v20.8h, v26.4s sqxtn v21.4h, v23.4s sqxtn2 v21.8h, v27.4s - stp q20, q21, [x0], #32 + stp q20, q21, [x0] + add x0, x0, x10 +.endm + +function ff_vvc_put_qpel_h64_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + mov x10, #(VVC_MAX_PB_SIZE * 2 - 32 * 3) + sub x2, x2, #64 + b 1f +endfunc + +function ff_hevc_put_hevc_qpel_h64_8_neon_i8mm, export=1 + QPEL_H_HEADER + mov x10, #32 + sub x2, x2, #64 +1: + put_qpel_h64_8_neon_i8mm subs w3, w3, #1 b.ne 1b ret endfunc + +function ff_vvc_put_qpel_h128_8_neon_i8mm, export=1 + VVC_QPEL_H_HEADER + sub x11, x2, #128 + mov x10, #32 + mov x2, #0 +1: + put_qpel_h64_8_neon_i8mm + put_qpel_h64_8_neon_i8mm + sub w3, w3, #1 + add x1, x1, x11 + cbnz w3, 1b + ret +endfunc + DISABLE_I8MM #endif diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 457be8c725..bcc7df8f6c 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -88,6 +88,15 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; + + if (have_i8mm(cpu_flags)) { + c->inter.put[0][1][0][1] = ff_vvc_put_qpel_h4_8_neon_i8mm; + c->inter.put[0][2][0][1] = ff_vvc_put_qpel_h8_8_neon_i8mm; + c->inter.put[0][3][0][1] = ff_vvc_put_qpel_h16_8_neon_i8mm; + c->inter.put[0][4][0][1] = ff_vvc_put_qpel_h32_8_neon_i8mm; + c->inter.put[0][5][0][1] = ff_vvc_put_qpel_h64_8_neon_i8mm; + c->inter.put[0][6][0][1] = ff_vvc_put_qpel_h128_8_neon_i8mm; + } } else if (bd == 10) { c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon;