From patchwork Wed Sep 11 18:06:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51518 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp485296vqi; Wed, 11 Sep 2024 11:19:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUQK4N/CI3AEn5TAhfeVx4AWbIbyl3Br2FAmm58tdA1rnm9kavmntwTHyfX4mukaZCsldbCSED9UEAViw+o4iqQ@gmail.com X-Google-Smtp-Source: AGHT+IGQTp/dNnShmxwCZQzr8QEJ1/LIQycFv4hIgBd2gC4QhJQ0jMdon6VTmzXhjZC5F6puBgmV X-Received: by 2002:a05:6402:254d:b0:5c2:6f74:782f with SMTP id 4fb4d7f45d1cf-5c413e4b6e0mr82994a12.5.1726078759014; Wed, 11 Sep 2024 11:19:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078759; cv=none; d=google.com; s=arc-20240605; b=jz667TunTaI5GN6N2qsKf51op61vVr9JsgYzo/1+dAhtboTfdYkRjyyGm/ZHP+5YGT zM8I03/MdBKDnAMHeO99or3BlX3WAeFznGV2CLmhjJpucZS36xyMFwrkLX/IsdyliZds 1TqvMnVhpByWVhyn5ta2KFEZB7kTLoEXdcTQWEs2SslMuNvE9T7Y97yOrSRlYtBrYURt OEEP0/IR2iMst/OBz6rujaHrF9NWsEqDK+MzWNYqLjXmu2r9VLNVlzoLXc3cGh/5PnmO MmTa5JbYyyFI7GleSRBU5m+KfnqRxgFw0J9+FJj565bShpFdM95Crz3Jd+TIEmSZDhoc /vOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=EXIOa9zNFz0Qw4cWdVkahU8w5/FlXySOlEo6WhRgHVY=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=huBThz6JcUJCQRgT2Qt45M1iomAiew80gbDTqFr/RYNg8lxEsiTDL6TZL8BhmFDvlQ y7UsedXxqBP4NDdpVIsrp+X0Lnc/Z6+FxrD3RCYuCcNjwrzw/+iuVTrG+ZeAYPXe4HDW wdM61uI5eJ5Zr37QmHRCIkxtBAstpDoFFi0BhYqy1tarEsb1RMRXld3EhyB5EtCg7/qv iVnrA47GheQzgAnZE3onb04u1GA5aGuqTRvSu5vC70NAEf1R4nO9UZMQRz7AEORZa3eK c9wjKNsmJq0U5QL+18ZWRH43rJB/O3aq6O7NNUPHG8kdHI92Ckem/7hMvw+Rvg77pSTG M7gA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="QfGtSL/B"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c3ebd8561asi6832346a12.222.2024.09.11.11.19.18; Wed, 11 Sep 2024 11:19:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b="QfGtSL/B"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EF7B268E2F9; Wed, 11 Sep 2024 21:06:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-242.mail.qq.com (out203-205-221-242.mail.qq.com [203.205.221.242]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 590A568E26E for ; Wed, 11 Sep 2024 21:06:35 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077987; bh=51rhxSTfp/wwqoDgjOBxHekM+PPJLCar+M09KrV+avI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=QfGtSL/BBA3w9XsTnGeNFc7SfQ/elG9CTbLU3nEhufdGdtAAMbor14vp3JHCf13Sh uBpX/AGBZBhUZlGFeupxFB6uRUTmNQzCqx4+d1kjUf0hNbTTh/DnFmlSKX5LDeeVaG uV+05JYBR2Xp0y27ktJ3Y9Fyu6Guzp+ddbqIOmFQ= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077986tr4sul490 Message-ID: X-QQ-XMAILINFO: Mdc3TkmnJyI/lpFxeXrX6vfJXOIC+E7pb5cGa0O0P1R/rhQPWM1ew+ebROyC75 EVPv9dIMVvF/LNZAfbP6SJF6sBMvH/wg1gInuqcLV2E8KwVbzxr2LJ9hrJft/dlovZcTI1veXTsi K1vjhyQKc9RCTJdmERItmpBbCTlbF9xYz1F2AOr16B4spVABB4ijjBrppD/vtsj786Nznv+k+zzV HU0r1DgTDS+ZFzJn6n88GLOxDyNCmjR7FkRzofwruOGwcAQldItYWC0WVNHE7yp6T8MrthUtrZx9 jk/k+Zcl8T2E0sZXanAZA1FCvJAJhKqzJX2/XFK/lnmU5RR8MMfkv/xw7SrQFjdscyi8tMgMJRtB Ks288Bm8kS+jj98pKiV3QHPWy6cqmSyBsoh25ZL3BlNHm4zNvVrnfzgzMOnfarkGq9EN0P4bH8Pk sWjfe3MduK7K9tOU1E/jPW/etb6BMdS1wn04rFHmC0U5mXiJGSoBU8CjfOYdAJC4Y7Y5aRL76UYr tItFpXg+Iy6tDwXNDX5YlsqZvJt2j+R3N7YvlbytiZRLNJp3BllgqEQ6ftcfBK+K6WK6qgYTCUls EEc/aDfCcBH8+3ckaQUsfh+ziw86dY8QW66TbT6kThXXxuI9O2pN0lvQLjtEKaqEd7Mo8VSfV653 QKu04qBwZ+z+knEMvIVupyV5W8SHf75XJTqoZyXF2IxbCXy3W75tM5HDglxJiTBBQzpk4xZq4ebj W7KdbZrVHdjvgdb96sMf+V9NljsZq9SsOcl5PjytwhvqfkDjmsMRhJNbR4Q2iKISN8MmP3Tx+HO/ GcCevwxa+qz7l2okTOmK3Mh/DIaojbvm4fWiIfbzzgXdXp9vnDZd0IUauygNC1cyVvgGkG4OFEoH fURgC4KYYdm1oYDVEe8AZs57CSO7fovIndn4SbyL8BgACLMhTGh2GfSgtviJBtIoB1h7aUgUbQ7P +lmsbQiMM4osC6UY/K2x+sYaYqp6cCP+4bY30pgOB6dLJ3N7H3O+kTpN5VsUGykZeySMihBPuKXn +H9KEy9Bq6zIA9GIU3hhwvQ8c1jZmgs/oTkvS9pQ== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:18 +0800 X-OQ-MSGID: <20240911180618.28921-15-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 14/14] aarch64/vvc: Add avg X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HZ8dQnrghPOT From: Zhao Zhili avg_8_2x2_c: 0.2 ( 1.00x) avg_8_2x2_neon: 0.2 ( 1.00x) avg_8_4x4_c: 0.2 ( 1.00x) avg_8_4x4_neon: 0.2 ( 1.00x) avg_8_8x8_c: 0.9 ( 1.00x) avg_8_8x8_neon: 0.2 ( 5.29x) avg_8_16x16_c: 3.7 ( 1.00x) avg_8_16x16_neon: 0.7 ( 5.44x) avg_8_32x32_c: 14.9 ( 1.00x) avg_8_32x32_neon: 1.7 ( 8.91x) avg_8_64x64_c: 59.7 ( 1.00x) avg_8_64x64_neon: 6.9 ( 8.62x) avg_8_128x128_c: 254.7 ( 1.00x) avg_8_128x128_neon: 26.9 ( 9.46x) avg_10_2x2_c: 0.2 ( 1.00x) avg_10_2x2_neon: 0.2 ( 1.00x) avg_10_4x4_c: 0.2 ( 1.00x) avg_10_4x4_neon: 0.2 ( 1.00x) avg_10_8x8_c: 0.9 ( 1.00x) avg_10_8x8_neon: 0.2 ( 5.29x) avg_10_16x16_c: 3.4 ( 1.00x) avg_10_16x16_neon: 0.4 ( 8.06x) avg_10_32x32_c: 13.9 ( 1.00x) avg_10_32x32_neon: 1.9 ( 7.23x) avg_10_64x64_c: 54.2 ( 1.00x) avg_10_64x64_neon: 8.4 ( 6.43x) avg_10_128x128_c: 232.4 ( 1.00x) avg_10_128x128_neon: 30.9 ( 7.52x) avg_12_2x2_c: 0.0 ( 0.00x) avg_12_2x2_neon: 0.2 ( 0.00x) avg_12_4x4_c: 0.4 ( 1.00x) avg_12_4x4_neon: 0.2 ( 2.43x) avg_12_8x8_c: 0.7 ( 1.00x) avg_12_8x8_neon: 0.2 ( 3.86x) avg_12_16x16_c: 3.7 ( 1.00x) avg_12_16x16_neon: 0.4 ( 8.65x) avg_12_32x32_c: 13.7 ( 1.00x) avg_12_32x32_neon: 2.2 ( 6.29x) avg_12_64x64_c: 53.9 ( 1.00x) avg_12_64x64_neon: 7.7 ( 7.03x) avg_12_128x128_c: 270.9 ( 1.00x) avg_12_128x128_neon: 30.4 ( 8.90x) --- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 16 +++ libavcodec/aarch64/vvc/inter.S | 163 ++++++++++++++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 libavcodec/aarch64/vvc/inter.S diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 7ba13a2165..ed80338969 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,6 +3,7 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/vvc/inter.o \ aarch64/vvc/sad.o \ aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 4867491620..ad767d17e2 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -42,6 +42,16 @@ int ff_vvc_sad_neon(const int16_t *src0, const int16_t *src1, int dx, int dy, const int block_w, const int block_h); +void ff_vvc_avg_8_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); +void ff_vvc_avg_10_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); +void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t dst_stride, + const int16_t *src0, const int16_t *src1, int width, + int height); + void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) { int cpu_flags = av_get_cpu_flags(); @@ -112,6 +122,8 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put_uni_w[0][5][0][0] = ff_vvc_put_pel_uni_w_pixels64_8_neon; c->inter.put_uni_w[0][6][0][0] = ff_vvc_put_pel_uni_w_pixels128_8_neon; + c->inter.avg = ff_vvc_avg_8_neon; + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; @@ -150,9 +162,13 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->inter.put[1][6][1][1] = ff_vvc_put_epel_hv128_8_neon_i8mm; } } else if (bd == 10) { + c->inter.avg = ff_vvc_avg_10_neon; + c->alf.filter[LUMA] = alf_filter_luma_10_neon; c->alf.filter[CHROMA] = alf_filter_chroma_10_neon; } else if (bd == 12) { + c->inter.avg = ff_vvc_avg_12_neon; + c->alf.filter[LUMA] = alf_filter_luma_12_neon; c->alf.filter[CHROMA] = alf_filter_chroma_12_neon; } diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S new file mode 100644 index 0000000000..2f69274b86 --- /dev/null +++ b/libavcodec/aarch64/vvc/inter.S @@ -0,0 +1,163 @@ +/* + * Copyright (c) 2024 Zhao Zhili + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +#define VVC_MAX_PB_SIZE 128 + +.macro vvc_avg, bit_depth + +.macro vvc_avg_\bit_depth\()_2_4, tap +.if \tap == 2 + ldr s0, [src0] + ldr s2, [src1] +.else + ldr d0, [src0] + ldr d2, [src1] +.endif + saddl v4.4s, v0.4h, v2.4h + add v4.4s, v4.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h +.if \tap == 2 + str h4, [dst] +.else // tap == 4 + str s4, [dst] +.endif + +.else // bit_depth > 8 + smin v4.4h, v4.4h, v17.4h + smax v4.4h, v4.4h, v18.4h +.if \tap == 2 + str s4, [dst] +.else + str d4, [dst] +.endif +.endif + add src0, src0, x10 + add src1, src1, x10 + add dst, dst, dst_stride +.endm + +function ff_vvc_avg_\bit_depth\()_neon, export=1 + dst .req x0 + dst_stride .req x1 + src0 .req x2 + src1 .req x3 + width .req w4 + height .req w5 + + mov x10, #(VVC_MAX_PB_SIZE * 2) + cmp width, #8 +.if \bit_depth == 8 + movi v16.4s, #64 +.else +.if \bit_depth == 10 + mov w6, #1023 + movi v16.4s, #16 +.else + mov w6, #4095 + movi v16.4s, #4 +.endif + movi v18.8h, #0 + dup v17.8h, w6 +.endif + b.eq 8f + b.hi 16f + cmp width, #4 + b.eq 4f +2: // width == 2 + subs height, height, #1 + vvc_avg_\bit_depth\()_2_4 2 + b.ne 2b + b 32f +4: // width == 4 + subs height, height, #1 + vvc_avg_\bit_depth\()_2_4 4 + b.ne 4b + b 32f +8: // width == 8 + ld1 {v0.8h}, [src0], x10 + ld1 {v2.8h}, [src1], x10 + saddl v4.4s, v0.4h, v2.4h + saddl2 v5.4s, v0.8h, v2.8h + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) + sqshrn2 v4.8h, v5.4s, #(15 - \bit_depth) + subs height, height, #1 +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h + st1 {v4.8b}, [dst], dst_stride +.else + smin v4.8h, v4.8h, v17.8h + smax v4.8h, v4.8h, v18.8h + st1 {v4.8h}, [dst], dst_stride +.endif + b.ne 8b + b 32f +16: // width >= 16 + mov w6, width + mov x7, src0 + mov x8, src1 + mov x9, dst +17: + ldp q0, q1, [x7], #32 + ldp q2, q3, [x8], #32 + saddl v4.4s, v0.4h, v2.4h + saddl2 v5.4s, v0.8h, v2.8h + saddl v6.4s, v1.4h, v3.4h + saddl2 v7.4s, v1.8h, v3.8h + add v4.4s, v4.4s, v16.4s + add v5.4s, v5.4s, v16.4s + add v6.4s, v6.4s, v16.4s + add v7.4s, v7.4s, v16.4s + sqshrn v4.4h, v4.4s, #(15 - \bit_depth) + sqshrn2 v4.8h, v5.4s, #(15 - \bit_depth) + sqshrn v6.4h, v6.4s, #(15 - \bit_depth) + sqshrn2 v6.8h, v7.4s, #(15 - \bit_depth) + subs w6, w6, #16 +.if \bit_depth == 8 + sqxtun v4.8b, v4.8h + sqxtun2 v4.16b, v6.8h + str q4, [x9], #16 +.else + smin v4.8h, v4.8h, v17.8h + smin v6.8h, v6.8h, v17.8h + smax v4.8h, v4.8h, v18.8h + smax v6.8h, v6.8h, v18.8h + stp q4, q6, [x9], #32 +.endif + b.ne 17b + + subs height, height, #1 + add src0, src0, x10 + add src1, src1, x10 + add dst, dst, dst_stride + b.ne 16b +32: + ret +endfunc +.endm + +vvc_avg 8 +vvc_avg 10 +vvc_avg 12