From patchwork Wed Jan 5 08:31:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 33080 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a6b:cd86:0:0:0:0:0 with SMTP id d128csp519958iog; Wed, 5 Jan 2022 00:31:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJx7AfnidD7MzVSwDv+l/Tp/qvb8I0FbQ7LfD4nvClYwnY1o414AvSXFnAM8YVZ1NJZM7aDV X-Received: by 2002:a17:907:3e9b:: with SMTP id hs27mr42269061ejc.590.1641371478460; Wed, 05 Jan 2022 00:31:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1641371478; cv=none; d=google.com; s=arc-20160816; b=NRTtMnYGgWydIMQxaobGcU5Dnh9mZc01xJS1Kxf96dhMRPEjdxFLZz8xlL3I9i/+de 9hGb5xf9jFQvtJZofNjC1RIe+kmLC495NVR2h21E0wKgcBHZiKSw17wYaaZCL44yIr51 al/N+wc5+UNTLz3AyFCj6i/zqyaUWWv0m6ONJOismeW9nWuT2zp59mXPDpuom+cYvhgt h1vqA1auUmRs8Vd8DVOzIWusMtE0I9aH2qrLhqGa3+OMUsIIp8n/BDEgqXK6Xc0gymV2 +k/OV0jyU8fQDnwa4GJb2gOiJuh50VyHKIx5gkNcGGWDJ7MQCZiuotLQD+UO4wYh4RVc WMpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=vZp11aZ06LIeoIJkUIgA5webO/bF/8uPBchH4dXwqoQ=; b=yeKualkx+yxnVvdXTtLpQRot5dmCiqH4z81QgznLZCDnUePReOVPmjiBhGApgtg9f6 2Wp537EGD2KwL1Dixu1XR8pE7uf4rSSdA378s/wtgvVawT0J2ZiuzlwEoi+qLGj49J4w EHcctg97ZMj15HFd3HdcTA7xjEhfYYFqC36a8FaHqG9GKxkOROGbgMhU28iLVAwoMq1y N/kHS4wLPrgPmnw3WZqAybE+nVD91L00qM1hpeCoDrTO46oJasgG0APMKkx/yzw/wfwc osz41ARBNGhRWBRkHFprF7NwLHeeXTY6RQIRpDFDxmUYm29p3sRMUFbAvi7Bk6CQPW+t 1eog== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=gVcSpwuT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id sd33si19470439ejc.55.2022.01.05.00.31.18; Wed, 05 Jan 2022 00:31:18 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20210112.gappssmtp.com header.s=20210112 header.b=gVcSpwuT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 916CC68A993; Wed, 5 Jan 2022 10:31:15 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 046C668A906 for ; Wed, 5 Jan 2022 10:31:09 +0200 (EET) Received: by mail-lj1-f179.google.com with SMTP id by39so64749407ljb.2 for ; Wed, 05 Jan 2022 00:31:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20210112.gappssmtp.com; s=20210112; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=nzmBC1JgJ102FtBTJVM6LwxdjnaQeleNW85hODVDlE4=; b=gVcSpwuTe5BfIEJZzlSK4lJqWzGvc30kDowjLMRM7H1fpS0mPtGp26Qy0HF+z5+T0N dHJJu2pGgFeE6K9R6tZaSuTT7btmT0qV+cb2A05z5jNTckM/LA3MGqEuGiQb4sTHT4JH 2hLZhwrek7rXp6/8V1+GDeTO5WvZNQqZy+leo9d2coaqi5Rg4raX8jOLsrdJjhJq3lpt L5KjMJYffTYQYqT4+3ZWcaXkKcojX6tEYUIb20va1IzlbXi5V2DwoxZSzvIpN0BoFuoa we2p9yPollqTjDisIlRUjH8Np99cO34TlxSP/Y8fCaJ4rXnqPoMXf7D+rP2GwgmAJgGB Yf2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=nzmBC1JgJ102FtBTJVM6LwxdjnaQeleNW85hODVDlE4=; b=Tpy8iMQPaCHAsgILgujwetJrQL7wa+YyqM117nYjwDxZgFIVNE4ropZgLJy9bC9IL7 3XMa7EO1fGfpDpoakCZ44iu645FZB73KwvugWoImK52jZNW/yD34bAW0WFoU2bIlDpfW izr2okekF99Hx1L2GUwxBtSKuSDNEgnIqKIIb169VAPJmdmqV4gzyzvXMDS7JGvV8Dmu DXo4NAMWUcqnj+Q05DN/iua1UazA9xdvxHxEaBaTo5wxlJLqUlIX1lwwj9c7S+DzTJZ/ hUQGX3GYRQjyTrR0RPPFd9scwt5djMhqnOs/4KUJqlSytoRS9luXQVf6JN5p7nHbgBQR sNcQ== X-Gm-Message-State: AOAM531IzpWrX/J5aF5TJ8PVSatgqCFdP6e7DyN8kcl99ODSAvsbO4WP 950yXjCAJ+N86JWey49Kw6Kgy68z5wDXqy8Q X-Received: by 2002:a2e:a54b:: with SMTP id e11mr31710002ljn.88.1641371468293; Wed, 05 Jan 2022 00:31:08 -0800 (PST) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id w25sm3631055lfl.229.2022.01.05.00.31.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jan 2022 00:31:07 -0800 (PST) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Wed, 5 Jan 2022 10:31:04 +0200 Message-Id: <20220105083107.1930899-1-martin@martin.st> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/4] Revert "lavc/aarch64: add hevc sao band 8x8 tiling" X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: kVIkj1Rq65NW This reverts commit f63f9be37c799ddc835af358034630d31fb7db02, as it breaks fate-hevc. --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +----- libavcodec/aarch64/hevcdsp_sao_neon.S | 11 ++++------- 2 files changed, 5 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 2002530266..b93cec9e44 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -77,11 +77,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; - c->sao_band_filter[0] = - c->sao_band_filter[1] = - c->sao_band_filter[2] = - c->sao_band_filter[3] = - c->sao_band_filter[4] = ff_hevc_sao_band_filter_8x8_8_neon; + c->sao_band_filter[0] = ff_hevc_sao_band_filter_8x8_8_neon; c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_neon; c->sao_edge_filter[1] = c->sao_edge_filter[2] = diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S index d524323fe8..73b0b3b056 100644 --- a/libavcodec/aarch64/hevcdsp_sao_neon.S +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -3,7 +3,7 @@ * * AArch64 NEON optimised SAO functions for HEVC decoding * - * Copyright (c) 2020-2021 J. Dekker + * Copyright (c) 2020 Josh Dekker * * This file is part of FFmpeg. * @@ -35,7 +35,6 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 stp xzr, xzr, [sp, #32] stp xzr, xzr, [sp, #48] mov w8, #4 - sxtw x6, w6 0: ldrsh x9, [x4, x8, lsl #1] // sao_offset_val[k+1] subs w8, w8, #1 add w10, w8, w5 // k + sao_left_class @@ -44,9 +43,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 bne 0b ld1 {v16.16b-v19.16b}, [sp], #64 movi v20.8h, #1 - sub x2, x2, x6 // stride_dst - width - sub x3, x3, x6 // stride_src - width -1: mov x8, x6 // beginning of line +1: mov w8, w6 // beginning of line 2: // Simple layout for accessing 16bit values // with 8bit LUT. // @@ -55,7 +52,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 // |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED|.... // +-----------------------------------> // i-0 i-1 i-2 i-3 - ld1 {v2.8b}, [x1], #8 // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); + ld1 {v2.8b}, [x1] // dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); uxtl v0.8h, v2.8b // load src[x] ushr v2.8h, v0.8h, #3 // >> BIT_DEPTH - 3 shl v1.8h, v2.8h, #1 // low (x2, accessing short) @@ -64,7 +61,7 @@ function ff_hevc_sao_band_filter_8x8_8_neon, export=1 tbx v2.16b, {v16.16b-v19.16b}, v1.16b // table add v1.8h, v0.8h, v2.8h // src[x] + table sqxtun v4.8b, v1.8h // clip + narrow - st1 {v4.8b}, [x0], #8 // store + st1 {v4.8b}, [x0] // store subs w8, w8, #8 // done 8 pixels bne 2b subs w7, w7, #1 // finished line, prep. new