From patchwork Wed Sep 11 18:06:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51516 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:612c:14c:b0:48e:c0f8:d0de with SMTP id h12csp482808vqi; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVsepr1uLFy2tsOE1vqzym4IJiSf36io3Q8By8z9h5W1LfVOpIjkYtCkm5H2WCQ8+BUICU6nJbXQ9BArPgy2haD@gmail.com X-Google-Smtp-Source: AGHT+IHDPJweGIU3amqXxNhGj+UPGWfjjx492MdkNwnCmaaSGDpkEC2yl7CLKaRnx6PUv4MikN/C X-Received: by 2002:a2e:a54e:0:b0:2f7:544e:5cca with SMTP id 38308e7fff4ca-2f787edb9b0mr857121fa.22.1726078459317; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1726078459; cv=none; d=google.com; s=arc-20240605; b=KjXng/gQy76EQoFOmmdNxHA1hVYCFOhc8PjZgR/dizxPTvlRcUC4itzPDA3BrAENZG UIMBQNdLVBq1V8Kk795EPXcyIydq6J4b2CN0mjQ8szi3JEF7a8uTMBQoeXcBsc5W6dhd JAECQaW+o3vy+hPqGnQ53S/OGtzxcr5Jf6z2YWEThcsbSjU2N/kEXurhLzFNh3eQE0jW A3RvqrINa5YUDsL717MmQ4EXyZRoTe32jnw0l7G5wCzN2xZY4IxDzQ+lL/XyR9CImsuf 8HBZmD7UnMU5QnocikgYMHoYOiibTKlapExkrsWdA5tIaJT3kRcVcyxEnkkMJgvDUpLz jsAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=zQLiZI0RiI/1GvaEZc397WqfAio4LeO5LP5r0wS7WAU=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=fp7T5wNMBzZQH1e93cc5ofj7hSCobc1h/lcIFxF/j425+dvym8mI6hlRKei9MHlhxb Vc+44+/7m3cncHzYPMlp5pwfJmd68aWvHd1fFhUcf0aeD9G+U4aW/10yCz/Clbpvoowf FtdIFaWHSLSHw9nif9HktuOeV0WbB2/xMHzCgOjqiqQzLV5FBzq2NtxG6jW6epwM+KNm MAKVD+g7L82dWZ+4tlwlCZmzfHNhX1a5RO3mQVATOisdDh3uNn4TZRsOrgZlSZGvMSlX Zw/6UY7tTjX4TCW6O4VsVEAiwS6L3HW66SrL6GnDOeUnf82qhwxVlkxLaq7Ez20yoEb8 9w2Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=dXHWt++I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 38308e7fff4ca-2f75bfea222si30945781fa.165.2024.09.11.11.14.18; Wed, 11 Sep 2024 11:14:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=dXHWt++I; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0F04068E2D5; Wed, 11 Sep 2024 21:06:48 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-153.mail.qq.com (out203-205-221-153.mail.qq.com [203.205.221.153]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 9376168E108 for ; Wed, 11 Sep 2024 21:06:33 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1726077985; bh=lJlqJMU6On0Mwk0jQDx2toc4IjR+b/7Tdeg/jfurVWA=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=dXHWt++IWJSgBto4J8vukxsg7pMq9wJ3PRgSUirtQ3KZ1kC5bXRR4110ocjmZyOpE 5B7BtGgis/1lZL764snficdH1Rz8ZaTbl1amHXpSFiEGCBtKU8tcV9FYCaen2GqA/k SC8SYx1NvVgL0H/3UTwQw1IpS+Nf4HXN/iVQQCaw= Received: from ZHILIZHAO-MB1.tencent.com ([113.118.115.139]) by newxmesmtplogicsvrsza10-0.qq.com (NewEsmtp) with SMTP id 19397694; Thu, 12 Sep 2024 02:06:19 +0800 X-QQ-mid: xmsmtpt1726077984t4ggqkoyi Message-ID: X-QQ-XMAILINFO: M/oQ2p0eBD4uJtk/rLbGz9q6k6stFD5vVuRT9fkoZ7/TotbqNcrZQA/VSpHVRb YBc+MuNC1RqLoOGHm+S2PuI0n9+y6Q3QGZrUjqCq725O2PlpADT7PR98NgixOttlRVGyZF978y+X tFRx9FDBmob40CeuOdcoVsZxWpD7lv5MS090O04yR24u66p3Cm4OeAQ97g5OaNiHWS9o2+iFDcC8 n1JLcZyk84XYI0nUE4iSei9tOhLlBCOBt4fxfvR2Ttn+F6s/g6QkR6bqaTFH/Mdg+s4iY1ovaOl5 yjLxcxxIj6/VK/DMcss/RMtY3Kv8c7V08vuc0xdoBrDyIXX0CRxV0FIPRfyFFXm7uDYuD+kB266s lWNmYD29Bvf5HXvfCazRkZSa4khjpryylwA6X//DqiPirtmzc7ZVOP0c3lYppqTQx3UYpI1KkvKY 0bxGUsYnKAXsK5yq/Nr7X1T8EckXgFci/El/TcXtAmuBIgKEmA+Cb19KnAYHiOETk6l2fXiEh1vQ PjYj14d6Do09R3dablVDZqeIEn6Wj3GSJcRF7iDqCoxDvk34QWOpLQle8rN2DFLnoFlVX1HCirkR xaLjSCVeTjrFD7SLgvdvlRT2rWeDJOjwhDWVj6xd/W6V1drNBNcevroEyIuOuf64uhWbAx2aWWX+ +KBiepIo9yyh8ok+UPXy+wnxBomJF5YOQz+dbA7p5vghrTUX0k7mrkZExaEoJl/GBimd+DX21NMv Hzj0F1SLq0X7y1e9wpUuorxhgEa5M6SxOMVzwjbGe/MleRQmPOgZ615oiVDHGRCSSZhrsnbbfWdh StYPxVdnyvxMWSW8ygjGr+qvMbrm83YhQLIK4YA/PxleN2iXXeMG6LIB0K1uiX9qoUZAVqhTRrLx 0swpHrVRWKCaE5ri7yfmTvDuXNJC5rkoMcGABwiRXQmVdtmShD9PqsgV7FBfHkXPU9LZVS6127lC VmThqMoTCSvA7fivwiMwnR9tp3C4vu6ztD4t1DdCIY/GHRyRlKRcNty/uyozOym4wVgyRz8YTr6a fRmJsE+gZ9vZcWwvIk X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Thu, 12 Sep 2024 02:06:14 +0800 X-OQ-MSGID: <20240911180618.28921-11-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240911180618.28921-1-quinkblack@foxmail.com> References: <20240911180618.28921-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 10/14] aarch64/vvc: Add sad X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: j7WMsiV7gRUj From: Zhao Zhili sad_8x16_c: 0.8 ( 1.00x) sad_8x16_neon: 0.2 ( 3.00x) sad_16x8_c: 0.5 ( 1.00x) sad_16x8_neon: 0.2 ( 2.00x) sad_16x16_c: 1.5 ( 1.00x) sad_16x16_neon: 0.2 ( 6.00x) --- libavcodec/aarch64/vvc/Makefile | 1 + libavcodec/aarch64/vvc/dsp_init.c | 5 +++ libavcodec/aarch64/vvc/sad.S | 75 +++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+) create mode 100644 libavcodec/aarch64/vvc/sad.S diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index a1c1f03e27..7ba13a2165 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -3,6 +3,7 @@ clean:: OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/vvc/sad.o \ aarch64/h26x/epel_neon.o \ aarch64/h26x/qpel_neon.o \ aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 934d918ffd..714d642634 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -39,6 +39,9 @@ #include "alf_template.c" #undef BIT_DEPTH +int ff_vvc_sad_neon(const int16_t *src0, const int16_t *src1, int dx, int dy, + const int block_w, const int block_h); + void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) { int cpu_flags = av_get_cpu_flags(); @@ -125,4 +128,6 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) c->alf.filter[LUMA] = alf_filter_luma_12_neon; c->alf.filter[CHROMA] = alf_filter_chroma_12_neon; } + + c->inter.sad = ff_vvc_sad_neon; } diff --git a/libavcodec/aarch64/vvc/sad.S b/libavcodec/aarch64/vvc/sad.S new file mode 100644 index 0000000000..beca876faf --- /dev/null +++ b/libavcodec/aarch64/vvc/sad.S @@ -0,0 +1,75 @@ +/* + * Copyright (c) 2024 Zhao Zhili + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +#define VVC_MAX_PB_SIZE 128 + +function ff_vvc_sad_neon, export=1 + src0 .req x0 + src1 .req x1 + dx .req w2 + dy .req w3 + block_w .req w4 + block_h .req w5 + + sub w7, dx, #4 + sub w8, dy, #4 + add w6, dx, dy, lsl #7 + add w7, w7, w8, lsl #7 + sxtw x6, w6 + sxtw x7, w7 + add src0, src0, x6, lsl #1 + sub src1, src1, x7, lsl #1 + + cmp block_w, #16 + movi v16.4s, #0 + b.ge 2f +1: + // block_w == 8 + ldr q0, [src0] + ldr q2, [src1] + subs block_h, block_h, #2 + sabal v16.4s, v0.4h, v2.4h + sabal2 v16.4s, v0.8h, v2.8h + + add src0, src0, #(2 * VVC_MAX_PB_SIZE * 2) + add src1, src1, #(2 * VVC_MAX_PB_SIZE * 2) + b.ne 1b + b 4f +2: + // block_w == 16, no block_w > 16 according the spec + movi v17.4s, #0 +3: + ldp q0, q1, [src0], #(2 * VVC_MAX_PB_SIZE * 2) + ldp q2, q3, [src1], #(2 * VVC_MAX_PB_SIZE * 2) + subs block_h, block_h, #2 + sabal v16.4s, v0.4h, v2.4h + sabal2 v16.4s, v0.8h, v2.8h + sabal v17.4s, v1.4h, v3.4h + sabal2 v17.4s, v1.8h, v3.8h + + b.ne 3b + add v16.4s, v16.4s, v17.4s +4: + addv s16, v16.4s + mov w0, v16.s[0] + ret +endfunc