From patchwork Fri Jul 16 17:56:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikhail Nitenko X-Patchwork-Id: 28943 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5d:965a:0:0:0:0:0 with SMTP id d26csp1996112ios; Fri, 16 Jul 2021 10:57:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzgsD8w+osE8yvRDN+GaOpFRK6hQKB8zBtLC891BCYsCgBb5DUa5PTusH3CanxB4obvRu7w X-Received: by 2002:a17:907:75d2:: with SMTP id jl18mr12802347ejc.238.1626458221331; Fri, 16 Jul 2021 10:57:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626458221; cv=none; d=google.com; s=arc-20160816; b=fmDQdqH7A762ASu5PByTjzkDo/hkY9+66qe+DwWC1PIbt4XecOthLk8PcQrqB7PZCB 7dFl7+DUzMqGJccr2s3CX+WYUUIE6NJbnonvMwacUGmSjotWL+V+xnqtFp434WWnj61a l8o0/BWHC72Rz0QkPIgmROwf4yTuuMUShPX1Ev+/TvEDSyNiDaFYPlZNVFW14Y+NwEoJ LQBlTWsqbB18M2q9RE6iiM6+qNwd9UR7GzSTKzMGWqge2PTBDEVZh4GNpmEUKidswpba MfOTdm1dEn59s5nCMCeADQxM1tt/p6emIQSm+S55lMaM/RKS3l5bbaLXwHAG1h+f4131 XlaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=BnzWk55vxaM6sJ+VoQwhS4bcnZA0xWXEJwYRvCkbKcE=; b=BceCPPbfkIOKWpewS0fTUqo+AEpyb7yNgWMCWQBfJyxoJdiFpgPkAhrcTJHCzfVrXq hVNlPY2V/Z+I4IfXGng6LIcPI/KUB+WoMREutHSg/Jbw6AxIDiH7Qp7HngdIrN6LS7d/ 7j6VtMLSvp3cZftaGYds3cfw5awp6grllb2B5tSnECY9HMNLJG6lvb19JvJRmWiTLA6r wH3Th+1xly3M/8LoHQcoyZBoIEcjNzLV7T56924fN6A7lms59WSExVk+7qfoTCowhLOQ 3IfUOQWILOZgzDlYsBB9SWqVpYMV2a8scalWYIIZi4iPQ+U6gB58NWm6WwYsKdguGD6R yMVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=EBTIRKQi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s15si4437349edd.474.2021.07.16.10.57.00; Fri, 16 Jul 2021 10:57:01 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=EBTIRKQi; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2281668A6BD; Fri, 16 Jul 2021 20:56:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 026AA68A0F1 for ; Fri, 16 Jul 2021 20:56:51 +0300 (EEST) Received: by mail-lf1-f45.google.com with SMTP id g22so5443892lfu.0 for ; Fri, 16 Jul 2021 10:56:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=l0jbFd16MpiqLc+Si7Jp+Mj5AThrZ+ehoEHftFz1v0o=; b=EBTIRKQiBQDDzODe9hRMMXtYxX9u5f5lqq4g3XY8WJBo5oiSK2p2vJMjzWz9C2q91J Ye8LO+zYQ9z6kJR0nggJHzCswFkpsmpf/p9wdhalXwI7U1U/hAoQSvc2aSysCsTkYZ1a QelTZ4D5j1AkxER93VNu0jYOq3zC7cVMdKY/Jq6mJJlhGz/zhw81PHCqypKR2yBttu82 Bjko8WFlN6oZG4Ft0sKNiBYP1iEG3ea9ECaChvjP/xUZuIXX5o2XyGHYXEUByVf+5r7K j+ro9Bs2dn2n36wqZs1YJ4hiC1kWQkBRwYEKBf+LxpnbhMCP0C5s5PN8JrLK0nK9IsCF spPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=l0jbFd16MpiqLc+Si7Jp+Mj5AThrZ+ehoEHftFz1v0o=; b=kxPX59J84DHIDrKvY/cCzsDTt84zXSEt5nhANWKtU4nOffwfuJ3pnnBvcVcifnYEiB X03OYuXwpDKUnqZXxwkQJIie0n/AmhHiUSltIWnOvbQLA4AqPanSyI2m5rMBBu/SXV8k TXw8i93M+E2aydY53Ol/p8F+pcqphczX8oPSZt4YzTC0MODMD2e6YBWOux2YBWyxs6UN HtMRWTfUDBbas80/N5X7uP8wJlj1IVsMfqKQDRIJ3MJ1YT0l+1131tjfOc83stk9/hxF JAaK2gFjcElrK9fcFHpe9iZe13LBFLCk6+acAovnNg1TYq1FyuugjrSBx90i/p7tIDTU W4kw== X-Gm-Message-State: AOAM530NsIm9TxFA2vCV28KG24oBHQohZcCALSq2OVxrf0PrgvSh3OLD oup1cvNUQ6w5giIiI5yu1kTNwx3Omy0mwQ== X-Received: by 2002:ac2:4310:: with SMTP id l16mr8781802lfh.481.1626458211001; Fri, 16 Jul 2021 10:56:51 -0700 (PDT) Received: from localhost.localdomain ([94.140.147.5]) by smtp.gmail.com with ESMTPSA id k10sm693701lfg.35.2021.07.16.10.56.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Jul 2021 10:56:50 -0700 (PDT) From: Mikhail Nitenko To: ffmpeg-devel@ffmpeg.org Date: Fri, 16 Jul 2021 20:56:38 +0300 Message-Id: <20210716175639.313513-1-mnitenko@gmail.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x8H to neon.S X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mikhail Nitenko Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ELBW6xWK7R0M transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 13 +++++++++++++ libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12 ------------ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S index 0fddbecae3..1ad32c359d 100644 --- a/libavcodec/aarch64/neon.S +++ b/libavcodec/aarch64/neon.S @@ -109,12 +109,25 @@ trn2 \r5\().4H, \r0\().4H, \r1\().4H trn1 \r6\().4H, \r2\().4H, \r3\().4H trn2 \r7\().4H, \r2\().4H, \r3\().4H + trn1 \r0\().2S, \r4\().2S, \r6\().2S trn2 \r2\().2S, \r4\().2S, \r6\().2S trn1 \r1\().2S, \r5\().2S, \r7\().2S trn2 \r3\().2S, \r5\().2S, \r7\().2S .endm +.macro transpose_4x8H r0, r1, r2, r3, t4, t5, t6, t7 + trn1 \t4\().8H, \r0\().8H, \r1\().8H + trn2 \t5\().8H, \r0\().8H, \r1\().8H + trn1 \t6\().8H, \r2\().8H, \r3\().8H + trn2 \t7\().8H, \r2\().8H, \r3\().8H + + trn1 \r0\().4S, \t4\().4S, \t6\().4S + trn2 \r2\().4S, \t4\().4S, \t6\().4S + trn1 \r1\().4S, \t5\().4S, \t7\().4S + trn2 \r3\().4S, \t5\().4S, \t7\().4S +.endm + .macro transpose_8x8H r0, r1, r2, r3, r4, r5, r6, r7, r8, r9 trn1 \r8\().8H, \r0\().8H, \r1\().8H trn2 \r9\().8H, \r0\().8H, \r1\().8H diff --git a/libavcodec/aarch64/vp9lpf_16bpp_neon.S b/libavcodec/aarch64/vp9lpf_16bpp_neon.S index 9075f3d406..9869614a29 100644 --- a/libavcodec/aarch64/vp9lpf_16bpp_neon.S +++ b/libavcodec/aarch64/vp9lpf_16bpp_neon.S @@ -22,18 +22,6 @@ #include "neon.S" -.macro transpose_4x8H r0, r1, r2, r3, t4, t5, t6, t7 - trn1 \t4\().8h, \r0\().8h, \r1\().8h - trn2 \t5\().8h, \r0\().8h, \r1\().8h - trn1 \t6\().8h, \r2\().8h, \r3\().8h - trn2 \t7\().8h, \r2\().8h, \r3\().8h - - trn1 \r0\().4s, \t4\().4s, \t6\().4s - trn2 \r2\().4s, \t4\().4s, \t6\().4s - trn1 \r1\().4s, \t5\().4s, \t7\().4s - trn2 \r3\().4s, \t5\().4s, \t7\().4s -.endm - // The input to and output from this macro is in the registers v16-v31, // and v0-v7 are used as scratch registers. // p7 = v16 .. p3 = v20, p0 = v23, q0 = v24, q3 = v27, q7 = v31 From patchwork Fri Jul 16 17:56:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikhail Nitenko X-Patchwork-Id: 28945 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a5d:965a:0:0:0:0:0 with SMTP id d26csp1996187ios; Fri, 16 Jul 2021 10:57:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw9GdYl05cbNseEyAgnBiOKYj8rOqB7s/vd9EuHN9ezWvcAxUXpO79Z5XTdNCxKVi7goSkh X-Received: by 2002:aa7:c641:: with SMTP id z1mr7086700edr.289.1626458229674; Fri, 16 Jul 2021 10:57:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626458229; cv=none; d=google.com; s=arc-20160816; b=BkZUj7yPjQRZ4JdGLYU/50hmKlPTdFW8xwKGwlw/w9XELLMbMr5729wZyD8SnbG+Lm +JXkiRQIdg0nbI8Pi+ohP3gA0sjeruy/HEAS5XMQbLeepMnV/3L6FJ6+3I3bYGAvthb8 XR9+nDsslp05LlGmZQsTSAsdN/2TV69b/STlykH3s5Ws7kuqad4DHi52rwPzeAoBXz6R UsrH1CVBbRvvcHF4p1mXVtQwYeJfqpJK2vvw+2wIUoglETAsIkeM4N5qXj9it1cImmei re3x89HAsTrdN0pj/IeN4g51WVq8tdsWJbHsdBM45lHfyCDbngr6xRzzWRBUVrc2BhHG g24A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=t8M73gtespgubbhpSOFO2Qdfke53Y1EbbMucDgEOSyQ=; b=DVOVv8afr1fZxwC5/h48sV69Q5iKKwUYRdF1DNsIH0mT0UR9gDvDa6uiREvAbN9Lrz gQtyoW7ymhy3d5uHmtnAMl/MX1l56HvR4cYQuALSAb9BIkze3qkvVq0RYXv4sea0T4jK hNMN7ZBsWVaVLUCKZbRit1OszQUM/FE0oFFlF46QLgxhDQgDgeodswzTqcfUFBa/l5/j Fm+15NTiiXpA5/1k5ycbDGYQF0lJDaOBw+UkOUQlE9YVkKS4CATm++g69mdfdpmK/W5d tdUqjC3XD2M49Ksmathobrkp4+9mm/3EorILjo4qgjxugg+RmVFjvlZlcP6O24vCYwrH xiOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=hK85IM2z; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id w15si11950115edv.312.2021.07.16.10.57.09; Fri, 16 Jul 2021 10:57:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=hK85IM2z; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1C9A968A476; Fri, 16 Jul 2021 20:57:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0FC4568A606 for ; Fri, 16 Jul 2021 20:56:57 +0300 (EEST) Received: by mail-lf1-f45.google.com with SMTP id g22so5444268lfu.0 for ; Fri, 16 Jul 2021 10:56:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=79k95KQtxBQLjUfKQX8f57fhBP4iYv58earfnRlJwyg=; b=hK85IM2zNXUZffDZF/rmryB10BKicZbUdOC4OoeanaGsS/lkQLLnXSdtsnEnv2V7WE +/WCralpZUkI7kxErusCH4JhAw/w2YF9G91XRz18FU7agWoaQx97HoXghVRLt/5VwG3C TXgCt539FK9fbb6Hgh6JlYtf1OXGvEfWR7MCgR4hiMxI7hYGbiWMaNylQrEyMdwCYFuq ieDhRO9X5fF2dE2cGP5NZIpkmejP6wz6GiilN0hQ3jHZJIeJJ0KouyElDmPeKd0Zkzmd FmqJX3YENF06Fh/RvlgX/EBBJbJu6iNEdzAFMGW8xU8ghgEtNJCuqLxtHoPzSpPGneS2 0e+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=79k95KQtxBQLjUfKQX8f57fhBP4iYv58earfnRlJwyg=; b=bDYxTcJVszabYmoUEEFJxUUmKcROhOBtsaB1YOS6DLflrKiERVZv7sAW4dR1nEMAFm ST8w2yORTM81eUiayeYAkfQ/o+87J1grYoIdrIUQtwLp4mJza6qw/peRMjL4eUCBPdw7 QRjsWuIe73Uc7MxHlnfVlFxNIhaPszJ4FKvGC1LLK4nv7YGAFPD+rdmpOsGN4ICSHk8P 9lF/SVa46mV/oLi6B9GU+LmMDBAbFnwifOrRDw9KJN7Ue8c8otA4dyrql1nBEGR9vZ2B f1V5hKTRXzeMNd0qESLQ28CSK9kBCXvXMPb+yMQRnZ7G8Y/d/CbEHaycZr12+O7sp8aW bwkA== X-Gm-Message-State: AOAM53277SVYOZnx6a1kxuVJOTwKWZ12TXRgScUNOr37CF91q1eRSx9G jL9T0qRTiV6qMvsK3JQktEHVofQQ6soeIw== X-Received: by 2002:ac2:48a9:: with SMTP id u9mr8601326lfg.277.1626458216152; Fri, 16 Jul 2021 10:56:56 -0700 (PDT) Received: from localhost.localdomain ([94.140.147.5]) by smtp.gmail.com with ESMTPSA id k10sm693701lfg.35.2021.07.16.10.56.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Jul 2021 10:56:55 -0700 (PDT) From: Mikhail Nitenko To: ffmpeg-devel@ffmpeg.org Date: Fri, 16 Jul 2021 20:56:39 +0300 Message-Id: <20210716175639.313513-2-mnitenko@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210716175639.313513-1-mnitenko@gmail.com> References: <20210716175639.313513-1-mnitenko@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] lavc/aarch64: h264, add chroma loop filters for 10bit X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mikhail Nitenko Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: g1tetqT1a4YI Benchmarks: A53 A72 h264_h_loop_filter_chroma422_10bpp_c: 293.0 116.7 h264_h_loop_filter_chroma422_10bpp_neon: 283.7 126.2 h264_h_loop_filter_chroma_10bpp_c: 165.2 58.5 h264_h_loop_filter_chroma_10bpp_neon: 74.7 87.2 h264_h_loop_filter_chroma_intra422_10bpp_c: 246.2 124.5 h264_h_loop_filter_chroma_intra422_10bpp_neon: 178.7 70.0 h264_h_loop_filter_chroma_intra_10bpp_c: 121.0 40.5 h264_h_loop_filter_chroma_intra_10bpp_neon: 73.7 59.2 h264_h_loop_filter_chroma_mbaff422_10bpp_c: 145.7 72.7 h264_h_loop_filter_chroma_mbaff422_10bpp_neon: 151.7 87.2 h264_h_loop_filter_chroma_mbaff_intra422_10bpp_c: 117.5 48.0 h264_h_loop_filter_chroma_mbaff_intra422_10bpp_neon: 73.7 37.7 h264_h_loop_filter_chroma_mbaff_intra_10bpp_c: 57.0 27.7 h264_h_loop_filter_chroma_mbaff_intra_10bpp_neon: 81.7 50.7 h264_h_loop_filter_luma_intra_8bpp_c: 242.7 134.0 h264_h_loop_filter_luma_intra_8bpp_neon: 100.7 53.5 h264_v_loop_filter_chroma_10bpp_c: 257.2 138.5 h264_v_loop_filter_chroma_10bpp_neon: 98.2 67.5 h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 76.2 h264_v_loop_filter_chroma_intra_10bpp_neon: 62.7 36.5 Signed-off-by: Mikhail Nitenko --- this code is a bit slow, particularly the horizontal versions, so any suggestions are greatly appreciated! libavcodec/aarch64/h264dsp_init_aarch64.c | 29 +++ libavcodec/aarch64/h264dsp_neon.S | 299 ++++++++++++++++++++++ 2 files changed, 328 insertions(+) diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c index d5baccf235..9ee9c11e15 100644 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c @@ -83,6 +83,21 @@ void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset, int16_t *block, int stride, const uint8_t nnzc[6*8]); +void ff_h264_v_loop_filter_chroma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_h_loop_filter_chroma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_h_loop_filter_chroma422_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_v_loop_filter_chroma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma422_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); + av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, const int chroma_format_idc) { @@ -125,5 +140,19 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, c->h264_idct8_add = ff_h264_idct8_add_neon; c->h264_idct8_dc_add = ff_h264_idct8_dc_add_neon; c->h264_idct8_add4 = ff_h264_idct8_add4_neon; + } else if (have_neon(cpu_flags) && bit_depth == 10) { + c->h264_v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_neon_10; + c->h264_v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon_10; + + if (chroma_format_idc <= 1) { + c->h264_h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_neon_10; + c->h264_h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon_10; + c->h264_h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10; + } else { + c->h264_h_loop_filter_chroma = ff_h264_h_loop_filter_chroma422_neon_10; + c->h264_h_loop_filter_chroma_mbaff = ff_h264_h_loop_filter_chroma_neon_10; + c->h264_h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma422_intra_neon_10; + c->h264_h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_intra_neon_10; + } } } diff --git a/libavcodec/aarch64/h264dsp_neon.S b/libavcodec/aarch64/h264dsp_neon.S index fbb8ecc463..92e5afa524 100644 --- a/libavcodec/aarch64/h264dsp_neon.S +++ b/libavcodec/aarch64/h264dsp_neon.S @@ -827,3 +827,302 @@ endfunc weight_func 16 weight_func 8 weight_func 4 + +.macro h264_loop_filter_start_10 + cmp w2, #0 + ldr w6, [x4] + ccmp w3, #0, #0, ne + lsl w2, w2, #2 // shift needed for 10bit + mov v24.S[0], w6 + lsl w3, w3, #2 + and w8, w6, w6, lsl #16 + b.eq 1f + cmp w6, #0 + b.eq 1f + ands w8, w8, w8, lsl #8 + b.ge 2f +1: + ret +2: +.endm + +.macro h264_loop_filter_start_intra_10 + orr w4, w2, w3 + cbnz w4, 1f + ret +1: + sxtw x1, w1 + lsl w2, w2, #2 // shift needed for 10bit + lsl w3, w3, #2 // shift needed for 10bit + dup v30.8h, w2 // alpha + dup v31.8h, w3 // beta +.endm + +.macro h264_loop_filter_chroma_10 + dup v22.8h, w2 // alpha + dup v23.8h, w3 // beta + uxtl v24.8h, v24.8b // tc0 + + uabd v26.8h, v16.8h, v0.8h // abs(p0 - q0) + uabd v28.8h, v18.8h, v16.8h // abs(p1 - p0) + uabd v30.8h, v2.8h, v0.8h // abs(q1 - q0) + + cmhi v26.8h, v22.8h, v26.8h // < alpha + cmhi v28.8h, v23.8h, v28.8h // < beta + cmhi v30.8h, v23.8h, v30.8h // < beta + + uxtl v4.4s, v0.4h + uxtl2 v5.4s, v0.8h + + and v26.16b, v26.16b, v28.16b + + usubw v4.4s, v4.4s, v16.4h + usubw2 v5.4s, v5.4s, v16.8h + + and v26.16b, v26.16b, v30.16b + + shl v4.4s, v4.4s, #2 + shl v5.4s, v5.4s, #2 + + mov x8, v26.d[0] + mov x9, v26.d[1] + orr x8, x8, x9 + + sli v24.8H, v24.8H, #8 + uxtl v24.8H, v24.8B + uaddw v4.4s, v4.4s, v18.4h + uaddw2 v5.4s, v5.4s, v18.8h // add p1 + + cbz x8, 9f + + usubw v4.4s, v4.4s, v2.4h + usubw2 v5.4s, v5.4s, v2.8h // sub q1 + rshrn v4.4h, v4.4s, #3 + rshrn2 v4.8h, v5.4s, #3 + + mov w8, #1 + dup v31.8h, w8 // this is actually important for higher depths, but not needed in 8 bit + sub v24.8h, v24.8h, v31.8h + shl v24.8h, v24.8h, #2 + add v24.8h, v24.8h, v31.8h + mov w8, #0 + dup v31.8h, w8 + smax v24.8h, v24.8h, v31.8h // this all feels like a huge hack (needed to exclude neg values) + + smin v4.8h, v4.8h, v24.8h + neg v25.8h, v24.8h + smax v4.8h, v4.8h, v25.8h + + uxtl v22.4s, v0.4h + uxtl2 v23.4s, v0.8h + + and v4.16B, v4.16B, v26.16B + + uxtl v28.4s, v16.4h + uxtl2 v29.4s, v16.8h + + saddw v28.4s, v28.4s, v4.4h + saddw2 v29.4s, v29.4s, v4.8h + + ssubw v22.4s, v22.4s, v4.4h + ssubw2 v23.4s, v23.4s, v4.8h + + sqxtun v16.4h, v28.4s + sqxtun2 v16.8h, v29.4s + + sqxtun v0.4h, v22.4s + sqxtun2 v0.8h, v23.4s + + mov w2, #1023 // for clipping + dup v4.8h, w2 + smin v0.8h, v0.8h, v4.8h + smin v16.8h, v16.8h, v4.8h +.endm + +function ff_h264_v_loop_filter_chroma_neon_10, export=1 + h264_loop_filter_start_10 + sxtw x1, w1 + + sub x0, x0, x1, lsl #1 + ld1 {v18.8h}, [x0], x1 + ld1 {v16.8h}, [x0], x1 + ld1 {v0.8h}, [x0], x1 + ld1 {v2.8h}, [x0] + + h264_loop_filter_chroma_10 + + sub x0, x0, x1, lsl #1 + st1 {v16.8h}, [x0], x1 + st1 {v0.8h}, [x0], x1 +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_neon_10, export=1 + h264_loop_filter_start_10 + sxtw x1, w1 + + sub x0, x0, #4 +h_loop_filter_chroma420_10: + ld1 {v18.d}[0], [x0], x1 + ld1 {v16.d}[0], [x0], x1 + ld1 {v0.d}[0], [x0], x1 + ld1 {v2.d}[0], [x0], x1 + ld1 {v18.d}[1], [x0], x1 + ld1 {v16.d}[1], [x0], x1 + ld1 {v0.d}[1], [x0], x1 + ld1 {v2.d}[1], [x0], x1 + + transpose_4x8H v18, v16, v0, v2, v28, v29, v30, v31 + + h264_loop_filter_chroma_10 + + transpose_4x8H v18, v16, v0, v2, v28, v29, v30, v31 + + sub x0, x0, x1, lsl #3 + st1 {v18.d}[0], [x0], x1 + st1 {v16.d}[0], [x0], x1 + st1 {v0.d}[0], [x0], x1 + st1 {v2.d}[0], [x0], x1 + st1 {v18.d}[1], [x0], x1 + st1 {v16.d}[1], [x0], x1 + st1 {v0.d}[1], [x0], x1 + st1 {v2.d}[1], [x0], x1 +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma422_neon_10, export=1 + sxtw x1, w1 + h264_loop_filter_start_10 + add x5, x0, x1 + sub x0, x0, #4 + add x1, x1, x1 + mov x7, x30 + bl h_loop_filter_chroma420_10 + mov x30, x7 + sub x0, x5, #4 + mov v24.s[0], w6 + b h_loop_filter_chroma420_10 +endfunc + +.macro h264_loop_filter_chroma_intra_10 + uabd v26.8h, v16.8h, v17.8h // abs(p0 - q0) + uabd v27.8h, v18.8h, v16.8h // abs(p1 - p0) + uabd v28.8h, v19.8h, v17.8h // abs(q1 - q0) + cmhi v26.8h, v30.8h, v26.8h // < alpha + cmhi v27.8h, v31.8h, v27.8h // < beta + cmhi v28.8h, v31.8h, v28.8h // < beta + and v26.16b, v26.16b, v27.16b + and v26.16b, v26.16b, v28.16b + mov x2, v26.d[0] + mov x3, v26.d[1] + orr x2, x2, x3 + + ushll v4.4s, v18.4h, #1 + ushll2 v5.4s, v18.8h, #1 + ushll v6.4s, v19.4h, #1 + ushll2 v7.4s, v19.8h, #1 + + cbz x2, 9f + + uaddl v20.4s, v16.4h, v19.4h + uaddl2 v21.4s, v16.8h, v19.8h + uaddl v22.4s, v17.4h, v18.4h + uaddl2 v23.4s, v17.8h, v18.8h + add v20.4s, v20.4s, v4.4s + add v21.4s, v21.4s, v5.4s + add v22.4s, v22.4s, v6.4s + add v23.4s, v23.4s, v7.4s + uqrshrn v24.4h, v20.4s, #2 + uqrshrn2 v24.8h, v21.4s, #2 + uqrshrn v25.4h, v22.4s, #2 + uqrshrn2 v25.8h, v23.4s, #2 + bit v16.16b, v24.16b, v26.16b + bit v17.16b, v25.16b, v26.16b +.endm + +function ff_h264_v_loop_filter_chroma_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + sub x0, x0, x1, lsl #1 + ld1 {v18.8h}, [x0], x1 + ld1 {v16.8h}, [x0], x1 + ld1 {v17.8h}, [x0], x1 + ld1 {v19.8h}, [x0] + + h264_loop_filter_chroma_intra_10 + + sub x0, x0, x1, lsl #1 + st1 {v16.8h}, [x0], x1 + st1 {v17.8h}, [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + + sub x4, x0, #4 + sub x0, x0, #2 + ld1 {v18.8h}, [x4], x1 + ld1 {v16.8h}, [x4], x1 + ld1 {v17.8h}, [x4], x1 + ld1 {v19.8h}, [x4], x1 + + transpose_4x8H v18, v16, v17, v19, v26, v27, v28, v29 + + h264_loop_filter_chroma_intra_10 + + st2 {v16.h,v17.h}[0], [x0], x1 + st2 {v16.h,v17.h}[1], [x0], x1 + st2 {v16.h,v17.h}[2], [x0], x1 + st2 {v16.h,v17.h}[3], [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + + sub x4, x0, #4 + sub x0, x0, #2 +h_loop_filter_chroma420_intra_10: + ld1 {v18.8h}, [x4], x1 + ld1 {v16.8h}, [x4], x1 + ld1 {v17.8h}, [x4], x1 + ld1 {v19.8h}, [x4], x1 + ld1 {v18.d}[1], [x4], x1 + ld1 {v16.d}[1], [x4], x1 + ld1 {v17.d}[1], [x4], x1 + ld1 {v19.d}[1], [x4], x1 + + transpose_4x8H v18, v16, v17, v19, v26, v27, v28, v29 + + h264_loop_filter_chroma_intra_10 + + st2 {v16.h,v17.h}[0], [x0], x1 + st2 {v16.h,v17.h}[1], [x0], x1 + st2 {v16.h,v17.h}[2], [x0], x1 + st2 {v16.h,v17.h}[3], [x0], x1 + st2 {v16.h,v17.h}[4], [x0], x1 + st2 {v16.h,v17.h}[5], [x0], x1 + st2 {v16.h,v17.h}[6], [x0], x1 + st2 {v16.h,v17.h}[7], [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma422_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + sub x4, x0, #4 + add x5, x0, x1, lsl #3 + sub x0, x0, #2 + mov x7, x30 + bl h_loop_filter_chroma420_intra_10 + sub x0, x5, #2 + mov x30, x7 + b h_loop_filter_chroma420_intra_10 +endfunc