From patchwork Wed Aug 28 15:21:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51204 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c944:0:b0:48e:c0f8:d0de with SMTP id k4csp953354vqt; Wed, 28 Aug 2024 08:27:30 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW6X2kO/NWToH1CFSzrEfYsztEh+bONzpzmpZk0EDHIEEJg0WZffjyvuith5yETeW6kqRGMoKGVmQKWztc4KYdG@gmail.com X-Google-Smtp-Source: AGHT+IElozDhWiRbaBEVp4zGeI7V572iZ4Sh6Wug2lT+YpUK+IFyDwOiNXyK81Ymq5qcBJmfmv6q X-Received: by 2002:a05:6402:26d3:b0:5a2:7cfe:2371 with SMTP id 4fb4d7f45d1cf-5c0891b4946mr6601862a12.3.1724858850498; Wed, 28 Aug 2024 08:27:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1724858850; cv=none; d=google.com; s=arc-20240605; b=Xev8FGFJZf6yfmSDguEGTUhDAf0u74B+9955BMW03ZGbZrBirUMDCJ/ETSNG7VKhfl nit9eMU1VwXPli1uqJOcefF0L3GcdrjeVT/bx8uwNbgxgHGxgW2CU5yFxPQCyMlg7E1D +CHFJ/UoL9VFWpaHF62lb0vDr5iE1b59mPKNJAwpIwLZk/okot6lZPHb5Gw1n6xB2jc6 aLl7jJ9gKdBftcTUBLa5SPDo7PgWu/Il62MnMEs/z279mNgpsLVo+thl0aRaIFpITVc8 RAkLUMmu48IkAnIk3fiEcAVDOn5rZDdmlE7CiSO8b4wpYwOHaSCzN2LT0A8MSno8k1IE KEpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:date:to:from:message-id :dkim-signature:delivered-to; bh=e/4F/xjqhZHjhnmMeAuDtr/wDmRVv3E+Ik3X2ubrlL8=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=FYzesrgRHhhOdUYOAtnTmhktzPJLggKek0KWgGkCjmwWRu+xj5qHvrQLFFOFMK8YPX V/mvyGK3gsqymVddCVUcnjlmFca67otETqDpWSuy2UlNTJAbm7bqGwgmw8ieZpnrm0zW gBICwLqH4f+QhRGUi/7gB+GQC2A3n07en5ofz4Htj1g9K3dZ4lsQOVBkXDnmHMVOctyK GYl7dDFtLglHXB4yWXprcg6N3JSP/lcj6WteoprogzgPJx2Ie1vdUomry87xg/h3mC9E q1JlUF03pFB+EhrnsQjr91QjyiLdkCmB4zA1el3Llef7RA2AEMPBQWUsSGYJ7q1+Nn96 0gaQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kDu+bLZw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c0bb4c4b95si3447139a12.685.2024.08.28.08.27.29; Wed, 28 Aug 2024 08:27:30 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=kDu+bLZw; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1493868DF70; Wed, 28 Aug 2024 18:27:27 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out203-205-221-164.mail.qq.com (out203-205-221-164.mail.qq.com [203.205.221.164]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 125BA68DD4B for ; Wed, 28 Aug 2024 18:27:18 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1724858834; bh=zBzpuKYFbKQLL0qwcPBu2RJlHsQsGhkEMixTbBF/iM8=; h=From:To:Cc:Subject:Date; b=kDu+bLZwetqlKWeALvl4WJiQS9wlL5AMnMZ4n9u6V8WgsYHJohFcNNWnZoM/bxdSB eApefbHMUlaqbuAYZ/cwXRaAQ2U0EKJun+XvYcxZmPS5K+7FamyoRtULy3aRDNEUgh 6K6jPYCWpoTHxh8+x5iwGz+1T0wl6So/abN2guD4= Received: from ZHILIZHAO-MB1.tencent.com ([121.35.184.249]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id 54314468; Wed, 28 Aug 2024 23:21:03 +0800 X-QQ-mid: xmsmtpt1724858463tqu6ipfva Message-ID: X-QQ-XMAILINFO: OQhZ3T0tjf0aYaulWQeaLCeut0YGSms57Cm3lJ8TnK7vqBGh/XCjJEDvQby9GI AkSFlRkNs9ZKb/qBtx6bZG4PCU6qx70GWkY0FA1kS5RKgCvWO1yHjvHWLBiXCqRYxe+PKCXEJpYr K6sSTd8DJJFoUPukWVQjH/2iQ1YVc6lU5NtxiDtcqQ0RgJRi56e7XeV0zsuXbXhRsSnENHjyCaTg TTLKnJj5tlRpFuaZQK4rN2C/06Iozpol3hgjZyRTnOcGUGuFgBhotOj7NJI5sJzw3XlxrAHdFrGa E0WIMhe35b7qkzYiGC0plxEwCJ5NtN7uaNQADgij+Mahvbn+ZLXcEBzo2lehv77QGdqZghwyi/4w kBvtXPeSdwiUnb5glcJ/DR3SIpR9/SyqxVuNAwY9wES1unMsAIQ738nSI05shV0zzCVScLMArEWM SxWad7R9Oe1hebH7b5gunwxm1nJ7anq5wtxIYnp9zvBbBvPZwumK64xdjSTB9Jaa5I5xyEXPP4Rj IwppcJFLxlaOUe4no2F6z3QglyXEs/9lselCKeYhOc8orktLM6cCNVrel+KVDfMnrYbJe8mh5RUI w6Gz0NAo22nI+5PA67uhJzn4548Ui48IHQQ41GqTEnTwDGf5k6shCAY02fLAKvoPIk5dsorHPq/S Lo6TC3vJFp6N8BVOvuTIk1+0KzLR4vgMFPuho5Dq4UMoG0517rokFeIuav/mKZ2zPWWiycgpVDEO ksOyxvB90TO/+ycrb6vMdq3A+mysT6AP1k6cq9HfUNB1bZrbE+ENet+6vfDQjbhCc4HTA/lN3oNN 07tMk+QePQDD8xf5smu/i2+XvjJqgCqmxKI86N7fX9nbvEyRSJ3AtukUtWWOtix5HUS6ZlRcfMeN Bt5AB++F1Ba6rZdwhlFHGCFifPgU2FkiJIACPyQzflwIt011luSB8vN1q49xE5/3Bud5Wh5s5ZAF wjpybdi+I1V5P/AyqJxMEDXEhMqI0RUQamv/TpEjug7N4MSQgO0w== X-QQ-XMRINFO: MSVp+SPm3vtS1Vd6Y4Mggwc= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Aug 2024 23:21:00 +0800 X-OQ-MSGID: <20240828152101.91510-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] aarch64/hevc: Move sao to h26x directory X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: EVcFNgoXMD+T From: Zhao Zhili So vvc can reuse the implementation. --- libavcodec/aarch64/Makefile | 2 +- libavcodec/aarch64/h26x/dsp.h | 36 +++++++++++++++++++ .../{hevcdsp_sao_neon.S => h26x/sao_neon.S} | 2 +- libavcodec/aarch64/hevcdsp_init_aarch64.c | 9 +---- 4 files changed, 39 insertions(+), 10 deletions(-) create mode 100644 libavcodec/aarch64/h26x/dsp.h rename libavcodec/aarch64/{hevcdsp_sao_neon.S => h26x/sao_neon.S} (99%) diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index de0653ebbc..a01e665b55 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -73,4 +73,4 @@ NEON-OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_deblock_neon.o \ aarch64/hevcdsp_init_aarch64.o \ aarch64/hevcdsp_qpel_neon.o \ aarch64/hevcdsp_epel_neon.o \ - aarch64/hevcdsp_sao_neon.o + aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h new file mode 100644 index 0000000000..4dcaf0e6bb --- /dev/null +++ b/libavcodec/aarch64/h26x/dsp.h @@ -0,0 +1,36 @@ +/* + * Copyright (C) 2024 Zhao Zhili + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_H26X_DSP_H +#define AVCODEC_AARCH64_H26X_DSP_H + +#include +#include + +void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, + ptrdiff_t stride_dst, ptrdiff_t stride_src, + const int16_t *sao_offset_val, int sao_left_class, + int width, int height); +void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); +void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); + +#endif diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/h26x/sao_neon.S similarity index 99% rename from libavcodec/aarch64/hevcdsp_sao_neon.S rename to libavcodec/aarch64/h26x/sao_neon.S index 30e83dda5d..dc407484de 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/h26x/sao_neon.S @@ -1,7 +1,7 @@ /* -*-arm64-*- * vim: syntax=arm64asm * - * AArch64 NEON optimised SAO functions for HEVC decoding + * AArch64 NEON optimised SAO functions for h26x decoding * * Copyright (c) 2022 J. Dekker * diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index e8c911deb4..7efae0f740 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -24,6 +24,7 @@ #include "libavutil/attributes.h" #include "libavutil/cpu.h" #include "libavutil/aarch64/cpu.h" +#include "libavcodec/aarch64/h26x/dsp.h" #include "libavcodec/hevc/dsp.h" void ff_hevc_v_loop_filter_chroma_8_neon(uint8_t *_pix, ptrdiff_t _stride, @@ -91,14 +92,6 @@ void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); void ff_hevc_transform_luma_4x4_neon_8(int16_t *coeffs); -void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, - ptrdiff_t stride_dst, ptrdiff_t stride_src, - const int16_t *sao_offset_val, int sao_left_class, - int width, int height); -void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, - const int16_t *sao_offset_val, int eo, int width, int height); -void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, - const int16_t *sao_offset_val, int eo, int width, int height); void ff_hevc_put_hevc_qpel_h4_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h6_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, From patchwork Wed Aug 28 15:21:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhao Zhili X-Patchwork-Id: 51203 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:c944:0:b0:48e:c0f8:d0de with SMTP id k4csp949356vqt; Wed, 28 Aug 2024 08:21:26 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWya2LyS/7kjxpriZJoSJgHV7aKPAozDpMm5I+rGTc3vYvYi4hInCxCpvFG0HYFv0xgDujv8Wil2pfO56vWFEQk@gmail.com X-Google-Smtp-Source: AGHT+IH37L9OBTABLumGs1zSU20SDgQSDqajiZnIy90bMG9IBOCeUmfd30kv6j8sL+H0/IKslck8 X-Received: by 2002:a05:651c:b27:b0:2f3:eeab:7f17 with SMTP id 38308e7fff4ca-2f61054a141mr502881fa.41.1724858485717; Wed, 28 Aug 2024 08:21:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1724858485; cv=none; d=google.com; s=arc-20240605; b=gWCNtl31+vp8D8efm0LZos3A3s7i5nOTJatvi3no8vZuHycm3vzgWmFOn/fSiBqInX 5PJLnxmRUI214GfbZIB10jC2rI404cClU/ISgIEdbxwV69q0u3/SPdxnIDDXi2REQrvg F25VKkz+0etsA30LKXZUD+ANjKLeDOpue4h6e+MswjfSH657EpGMRRUdi2jl6MQXdfd/ ScTszSEfrjqfaZ2S5Dr//jUf/oPT2BwJGASE/2JFSUP7lzhWZ0XbxotdKcg0LwptqDRK R9Eg9XUi1LJk+QY8zccrO59Eb4ygcBDyI/Ner4YMIhNsfqiqtlpDGNmuCp2OBgJ4h5Hl GO5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to:date :to:from:message-id:dkim-signature:delivered-to; bh=FTGd+UJ0p9uv0orPqWJ5gVFe4L3c8wdsVcbP5EPkKck=; fh=HnHYuZ9XgUo86ZRXTLWWmQxhslYEI9B9taZ5X1DLFfc=; b=cSqzxfRslKlivVA+uHAG+HS2pUFm53pFsEsymhWEUfOxO1ALqXr8bJ2SSsplaGgLhU 2pfTjP2Ag2FnU++ykIGCwHYb8eWPzDrzGxvSii++mgybyFA/Oin5+Cs3qDbeiYnKXB2S Y8/ML5SgoaG8cx5xtwVfpQvBfjxQnMhlb1Kf5VRItkzWT2lU84uDGJwoyOPUyUp6x7bm 5UyCpp2Bp0QdK4i5MskN1s066YpeZZLPXjxGhhwTg9tfSDI6QWeKeyE/5FhY3uSEMaH+ Dqd02xayFgLh/4ESUHcUDzroLxjb5x/AqU72seU6E58Zq0JK+qB17VyZGOKeo1A1uzeF hWfg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hAejJBKD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5c0bb4a5d55si3328260a12.581.2024.08.28.08.21.25; Wed, 28 Aug 2024 08:21:25 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@foxmail.com header.s=s201512 header.b=hAejJBKD; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=foxmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 26AB868DF6A; Wed, 28 Aug 2024 18:21:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out162-62-57-49.mail.qq.com (out162-62-57-49.mail.qq.com [162.62.57.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0FD4C68DEAB for ; Wed, 28 Aug 2024 18:21:13 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1724858465; bh=h0+0VMn4ruwmoPX8BIHJZcOL6WFqjxS7iBl4YssXq6s=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=hAejJBKD/gC+JMS3VvcmIDLdlgAVyWvXSuG20vXoP+df1Be5O+rtpOJxMY2is0mLc 07XYwjfMn/s6/OdbHYA1mCAEeMOAT1bGg9AWophVhaGI6Ro0/Ne3BMJbhzV1qdP73a T/rqm4NqO0Gh1iXNdECdt11uGjtp6PEGZlThkEBY= Received: from ZHILIZHAO-MB1.tencent.com ([121.35.184.249]) by newxmesmtplogicsvrsza15-1.qq.com (NewEsmtp) with SMTP id 54314468; Wed, 28 Aug 2024 23:21:03 +0800 X-QQ-mid: xmsmtpt1724858464t13uzbp08 Message-ID: X-QQ-XMAILINFO: MyIXMys/8kCtLI6dXacj2OSaxYuP735lortpgTp/QXnliViLxfRFzHfVdcd4v8 nC8YH+Vuq5BeKt37BeEXe3Jd4suLgSwrvQK2q72OlSh7pn/YEV9MXziZsouB0VnRV6t1oACMkO+V lp4IlPgAjEUeUgCx7HJi9Ab+yMLjf1FDmT28s0niIfW/rUTOHyKls1VAignCdO83cpgXQTEeG8c9 TwfNB708PyOBSV9p4wSWZk3+GxFUupDy2Yivz82cba3HYrRH0Vo1Rh7yIPXgqzGxX87B5ZWRW9EX BuElmchiN4cBr8cqmKE0zeMDBUqECINOwBvS6b6xPdCQHExYz8dYpg6UhD+bgpRz82keHw7zb2Fw zzACTDNf3VOx/sNbUGhhu/bFRUQCSD75F6M7li2LHZAsy9j/pLEYQ7QU/Tod0I9i/IWgV3aYe7TM rF870zD7qzN6KvdBYqj/PBaziDR4K21W/7YAi9lev8QTIVeVfUAMWzHFKz8fYnn0iJoQqx4y2g7k c77rHN/OUlvAUXR/fE0kdi0x8ZWUJAN2nChb+x629JoayZQCgjcMZLZhSHYA+m3pIyttq4/HALEw zm+pIrhhj69kBI09kIQ9dKPGw6ZNbQtYB9c2swb/IinpDptU3YkTJO3WOvTuBY1TBnXtd78V9TSl tNBzXByCLH6IzJg0qqd3UspCP6RQrhwZaN+jb4zziyvb4gAF2D72wye9BMO4qRS1PF7ex/fQyHuF JlOArGtTxZhAbxl2L4t7gNlNN7kRfI+DXxost3ccm69qQFkRzE7QVpvyplTQMIQ5q1QvH8quB05X bltVSOqhrCOsMk4Ozst5cwgAHUrYsjE+AAyGwYJFxeue/nPiEQG62Tx02E0yeC7gWHscR2yLLGN/ 5gn6vvXaU4gpixphxbCQidMQOynhk4ImRWqIPEWQYcDZuEwTbCOGHhH7+PqRBDuaSPDqThNEv8Kj u/sPvK1mpWMF1v1Ww4hFvuUMVSnj2Wp+kj250ihHfU0usQJlYClQ== X-QQ-XMRINFO: NS+P29fieYNw95Bth2bWPxk= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Aug 2024 23:21:01 +0800 X-OQ-MSGID: <20240828152101.91510-2-quinkblack@foxmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240828152101.91510-1-quinkblack@foxmail.com> References: <20240828152101.91510-1-quinkblack@foxmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] aarch64/vvc: Bind h26x/sao filter implementation to vvc X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: /o1yhXlM3+sq From: Zhao Zhili --- libavcodec/aarch64/h26x/dsp.h | 6 +++- libavcodec/aarch64/h26x/sao_neon.S | 44 +++++++++++++++++------ libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 +- libavcodec/aarch64/vvc/Makefile | 5 +-- libavcodec/aarch64/vvc/dsp_init.c | 6 ++++ 5 files changed, 48 insertions(+), 15 deletions(-) diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h index 4dcaf0e6bb..d3f7a4dfe3 100644 --- a/libavcodec/aarch64/h26x/dsp.h +++ b/libavcodec/aarch64/h26x/dsp.h @@ -24,7 +24,7 @@ #include #include -void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, +void ff_h26x_sao_band_filter_8x8_8_neon(uint8_t *_dst, const uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, const int16_t *sao_offset_val, int sao_left_class, int width, int height); @@ -33,4 +33,8 @@ void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrd void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, const int16_t *sao_offset_val, int eo, int width, int height); +void ff_vvc_sao_edge_filter_16x16_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); +void ff_vvc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, + const int16_t *sao_offset_val, int eo, int width, int height); #endif diff --git a/libavcodec/aarch64/h26x/sao_neon.S b/libavcodec/aarch64/h26x/sao_neon.S index dc407484de..c43820135e 100644 --- a/libavcodec/aarch64/h26x/sao_neon.S +++ b/libavcodec/aarch64/h26x/sao_neon.S @@ -24,15 +24,17 @@ #include "libavutil/aarch64/asm.S" -#define MAX_PB_SIZE 64 +#define HEVC_MAX_PB_SIZE 64 +#define VVC_MAX_PB_SIZE 128 #define AV_INPUT_BUFFER_PADDING_SIZE 64 -#define SAO_STRIDE (2*MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) +#define HEVC_SAO_STRIDE (2 * HEVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) +#define VVC_SAO_STRIDE (2 * VVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE) // void sao_band_filter(uint8_t *_dst, uint8_t *_src, // ptrdiff_t stride_dst, ptrdiff_t stride_src, // int16_t *sao_offset_val, int sao_left_class, // int width, int height) -function ff_hevc_sao_band_filter_8x8_8_neon, export=1 +function ff_h26x_sao_band_filter_8x8_8_neon, export=1 stp xzr, xzr, [sp, #-64]! stp xzr, xzr, [sp, #16] stp xzr, xzr, [sp, #32] @@ -79,16 +81,30 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 ret endfunc -.Lsao_edge_pos: +.Lhevc_sao_edge_pos: .word 1 // horizontal -.word SAO_STRIDE // vertical -.word SAO_STRIDE + 1 // 45 degree -.word SAO_STRIDE - 1 // 135 degree +.word HEVC_SAO_STRIDE // vertical +.word HEVC_SAO_STRIDE + 1 // 45 degree +.word HEVC_SAO_STRIDE - 1 // 135 degree + +.Lvvc_sao_edge_pos: +.word 1 // horizontal +.word VVC_SAO_STRIDE // vertical +.word VVC_SAO_STRIDE + 1 // 45 degree +.word VVC_SAO_STRIDE - 1 // 135 degree + +function ff_vvc_sao_edge_filter_16x16_8_neon, export=1 + adr x7, .Lvvc_sao_edge_pos + mov x15, #VVC_SAO_STRIDE + b 1f +endfunc // ff_hevc_sao_edge_filter_16x16_8_neon(char *dst, char *src, ptrdiff stride_dst, // int16 *sao_offset_val, int eo, int width, int height) function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 - adr x7, .Lsao_edge_pos + adr x7, .Lhevc_sao_edge_pos + mov x15, #HEVC_SAO_STRIDE +1: ld1 {v3.8h}, [x3] // load sao_offset_val add w5, w5, #0xF bic w5, w5, #0xF @@ -101,7 +117,6 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 uzp2 v1.16b, v3.16b, v3.16b // sao_offset_val -> upper uzp1 v0.16b, v3.16b, v3.16b // sao_offset_val -> lower movi v2.16b, #2 - mov x15, #SAO_STRIDE // strides between end of line and next src/dst sub x15, x15, x5 // stride_src - width sub x16, x2, x5 // stride_dst - width @@ -145,10 +160,18 @@ function ff_hevc_sao_edge_filter_16x16_8_neon, export=1 ret endfunc +function ff_vvc_sao_edge_filter_8x8_8_neon, export=1 + adr x7, .Lvvc_sao_edge_pos + mov x15, #VVC_SAO_STRIDE + b 1f +endfunc + // ff_hevc_sao_edge_filter_8x8_8_neon(char *dst, char *src, ptrdiff stride_dst, // int16 *sao_offset_val, int eo, int width, int height) function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 - adr x7, .Lsao_edge_pos + adr x7, .Lhevc_sao_edge_pos + mov x15, #HEVC_SAO_STRIDE +1: ldr w4, [x7, w4, uxtw #2] ld1 {v3.8h}, [x3] mov v3.h[7], v3.h[0] @@ -160,7 +183,6 @@ function ff_hevc_sao_edge_filter_8x8_8_neon, export=1 movi v2.16b, #2 add x16, x0, x2 lsl x2, x2, #1 - mov x15, #SAO_STRIDE mov x8, x1 sub x9, x1, x4 add x10, x1, x4 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 7efae0f740..a90da0246e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -384,7 +384,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->sao_band_filter[1] = c->sao_band_filter[2] = c->sao_band_filter[3] = - c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_band_filter[4] = ff_h26x_sao_band_filter_8x8_8_neon; c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_neon; c->sao_edge_filter[1] = c->sao_edge_filter[2] = diff --git a/libavcodec/aarch64/vvc/Makefile b/libavcodec/aarch64/vvc/Makefile index 58398d6e3d..54c49fea92 100644 --- a/libavcodec/aarch64/vvc/Makefile +++ b/libavcodec/aarch64/vvc/Makefile @@ -1,5 +1,6 @@ clean:: $(RM) $(CLEANSUFFIXES:%=libavcodec/aarch64/vvc/%) -OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o -NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o +OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/dsp_init.o +NEON-OBJS-$(CONFIG_VVC_DECODER) += aarch64/vvc/alf.o \ + aarch64/h26x/sao_neon.o diff --git a/libavcodec/aarch64/vvc/dsp_init.c b/libavcodec/aarch64/vvc/dsp_init.c index 2a9f25911f..0aac140a8f 100644 --- a/libavcodec/aarch64/vvc/dsp_init.c +++ b/libavcodec/aarch64/vvc/dsp_init.c @@ -22,6 +22,7 @@ #include "libavutil/cpu.h" #include "libavutil/aarch64/cpu.h" +#include "libavcodec/aarch64/h26x/dsp.h" #include "libavcodec/vvc/dsp.h" #include "libavcodec/vvc/dec.h" #include "libavcodec/vvc/ctu.h" @@ -45,6 +46,11 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd) return; if (bd == 8) { + for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++) + c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon; + c->sao.edge_filter[0] = ff_vvc_sao_edge_filter_8x8_8_neon; + for (int i = 1; i < FF_ARRAY_ELEMS(c->sao.edge_filter); i++) + c->sao.edge_filter[i] = ff_vvc_sao_edge_filter_16x16_8_neon; c->alf.filter[LUMA] = alf_filter_luma_8_neon; c->alf.filter[CHROMA] = alf_filter_chroma_8_neon; } else if (bd == 10) {