From patchwork Mon Aug 16 09:45:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikhail Nitenko X-Patchwork-Id: 29562 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1874827iov; Mon, 16 Aug 2021 02:46:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzmEtr9hcolpzmH+He18irJ3bzWqafz7da3izC/+IWkJMiDD9d8ri/N+Yrc+B1yum0jhB+Y X-Received: by 2002:aa7:d504:: with SMTP id y4mr2595339edq.138.1629107165756; Mon, 16 Aug 2021 02:46:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629107165; cv=none; d=google.com; s=arc-20160816; b=xxrD4a7hpBOl/fxYxkINPNzvzmNhlUboZ0cQ2dfzB3uptzkIRSNaLiM73AnkGqlQmH vme2TvLAlNyW2C0ZACdsiYU9zK7IhQORJlm3ienfTIKFGD4+XKwgaOuVZVm4d2PA0iMr O/dC2CssftKqOXIpMR4EgTH9wiJwmXHkkPcKmdNfMIym8nQCcpYwlL2G0H1C0NxgSm6L n+151I+ah5pOYyn0365MWVjdgt+xg1+fhbOM9HcrLjS9FMfNrizZp1e3hSZRUpGms9zZ 02FijDbp+NPRH5d1aOvzSP4BSQX5CQ+th7cb27H+PWlz6wE9PSddeJdaClESTznMot9x ZhFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=BnzWk55vxaM6sJ+VoQwhS4bcnZA0xWXEJwYRvCkbKcE=; b=jkoIQHZylla27LIaV6OduVx4Yry3c8GeJUs62nEWEbmXS+zZ9abyTm9IPOJZ9/XS93 0Eyed7ilCE0W4QIpR0/p0OwUI0hweEh/CEHpeEymTnKGNH6Ob6g6zRCsXbdLpE6sGSkd UlDCBVdpxuIl04Kxk8e4Yg/28doAhd5rCTX3OVfLr4nO4+uhzZCCIvTDvsVrOEfbPm1t mPKfgwP6w18Xx/CRSAN52fjaicnd5kwPJzuRAzEKW1pB1zW7yORU6hKewO7VkrTJFnNG bFd+RfHJKkrbFoDZoqCNb3DVYXMK3vLx6SF0PEOhHIbuiIs8UOlhSs71J0a7+dqPWQx+ DIcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=oc+GhEAr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id n4si9847396edy.3.2021.08.16.02.46.05; Mon, 16 Aug 2021 02:46:05 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=oc+GhEAr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AD25668A42C; Mon, 16 Aug 2021 12:46:01 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A3FAB689BC9 for ; Mon, 16 Aug 2021 12:45:55 +0300 (EEST) Received: by mail-lf1-f49.google.com with SMTP id i9so12430783lfg.10 for ; Mon, 16 Aug 2021 02:45:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=l0jbFd16MpiqLc+Si7Jp+Mj5AThrZ+ehoEHftFz1v0o=; b=oc+GhEArA+OcWjUkGLIwach+QrPhGC5boxr2AGS/pYjresTSvjEvmZRfB4Op3ZBMKz TqW6iExqUhUrxe0IjBJmtmYWADXCn3UFL4+ZvpXofoPDUTyU8yqOKk3qCs/oAZ5B2CoG 5QehopDqktNfklnoHK6Ozva1axjP3mRWD0NxBSOxPIxwLSA0FL4onfIMlBL5WbMYSnpl sgEn7Ngh3N2uGEPpWFdp9V7K0H5l1M6R7sM90BRIj0U1H4W84V1fSFGaATAUlyxbV5cY HwAWwB0vIBRn3fG07tpN2D4GYVYmzLhf0hnVwr2a29RWyKyaRndMtIPLsgWdtCBTMVDj b26A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=l0jbFd16MpiqLc+Si7Jp+Mj5AThrZ+ehoEHftFz1v0o=; b=dTijfDlpDnXjtYajT4KU9MC2CfGO9u+g1vLBG6v64zT6D6NVeiA9ZUXspAl1/PVmPH ik6x1EhwfaR+biACdk7iNjbPLV7ImX4aM40HpU4Le468WOEPN8P4WK26zCW67xEGu7sv 8wEIrTpzg0A01l81ScqWn48YMYIXhufDiFgUcWYDNjkba9Wj7xyKETNFgOaEkWbTwaHj kcSMuiFxnh230Cfz6UTHqqvLygH0RQICiv60heQ77vt0bpYN8Mi4c3UuSWAfJKwX6t7H 6EjSG/8OgvruYuqCdPYswSl9RTS+ZGgb5VzRmjoWB7D68GkNRlO7Drai9UdCnDVk1aYg gVjw== X-Gm-Message-State: AOAM532vujlX5ED/lSJ9Ys7RoMgTrO7mpzV6Xi7/UushPyWQvcKJwwEs GUVlGjChtwIb4nq3V8oNM/6KoxBvmpkLlw== X-Received: by 2002:a05:6512:5ce:: with SMTP id o14mr296782lfo.252.1629107153850; Mon, 16 Aug 2021 02:45:53 -0700 (PDT) Received: from localhost.localdomain ([109.195.102.12]) by smtp.gmail.com with ESMTPSA id v16sm205995lfq.87.2021.08.16.02.45.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Aug 2021 02:45:53 -0700 (PDT) From: Mikhail Nitenko To: ffmpeg-devel@ffmpeg.org Date: Mon, 16 Aug 2021 14:45:44 +0500 Message-Id: <20210816094545.448283-1-mnitenko@gmail.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: move transpose_4x8H to neon.S X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mikhail Nitenko Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: rO5r/66iJlIh transpose_4x8H was declared in vp9lpf_16bpp_neon, however this macro is not unique to vp9 and could be used elsewhere. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/neon.S | 13 +++++++++++++ libavcodec/aarch64/vp9lpf_16bpp_neon.S | 12 ------------ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S index 0fddbecae3..1ad32c359d 100644 --- a/libavcodec/aarch64/neon.S +++ b/libavcodec/aarch64/neon.S @@ -109,12 +109,25 @@ trn2 \r5\().4H, \r0\().4H, \r1\().4H trn1 \r6\().4H, \r2\().4H, \r3\().4H trn2 \r7\().4H, \r2\().4H, \r3\().4H + trn1 \r0\().2S, \r4\().2S, \r6\().2S trn2 \r2\().2S, \r4\().2S, \r6\().2S trn1 \r1\().2S, \r5\().2S, \r7\().2S trn2 \r3\().2S, \r5\().2S, \r7\().2S .endm +.macro transpose_4x8H r0, r1, r2, r3, t4, t5, t6, t7 + trn1 \t4\().8H, \r0\().8H, \r1\().8H + trn2 \t5\().8H, \r0\().8H, \r1\().8H + trn1 \t6\().8H, \r2\().8H, \r3\().8H + trn2 \t7\().8H, \r2\().8H, \r3\().8H + + trn1 \r0\().4S, \t4\().4S, \t6\().4S + trn2 \r2\().4S, \t4\().4S, \t6\().4S + trn1 \r1\().4S, \t5\().4S, \t7\().4S + trn2 \r3\().4S, \t5\().4S, \t7\().4S +.endm + .macro transpose_8x8H r0, r1, r2, r3, r4, r5, r6, r7, r8, r9 trn1 \r8\().8H, \r0\().8H, \r1\().8H trn2 \r9\().8H, \r0\().8H, \r1\().8H diff --git a/libavcodec/aarch64/vp9lpf_16bpp_neon.S b/libavcodec/aarch64/vp9lpf_16bpp_neon.S index 9075f3d406..9869614a29 100644 --- a/libavcodec/aarch64/vp9lpf_16bpp_neon.S +++ b/libavcodec/aarch64/vp9lpf_16bpp_neon.S @@ -22,18 +22,6 @@ #include "neon.S" -.macro transpose_4x8H r0, r1, r2, r3, t4, t5, t6, t7 - trn1 \t4\().8h, \r0\().8h, \r1\().8h - trn2 \t5\().8h, \r0\().8h, \r1\().8h - trn1 \t6\().8h, \r2\().8h, \r3\().8h - trn2 \t7\().8h, \r2\().8h, \r3\().8h - - trn1 \r0\().4s, \t4\().4s, \t6\().4s - trn2 \r2\().4s, \t4\().4s, \t6\().4s - trn1 \r1\().4s, \t5\().4s, \t7\().4s - trn2 \r3\().4s, \t5\().4s, \t7\().4s -.endm - // The input to and output from this macro is in the registers v16-v31, // and v0-v7 are used as scratch registers. // p7 = v16 .. p3 = v20, p0 = v23, q0 = v24, q3 = v27, q7 = v31 From patchwork Mon Aug 16 09:45:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikhail Nitenko X-Patchwork-Id: 29563 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6602:2a4a:0:0:0:0 with SMTP id k10csp1874908iov; Mon, 16 Aug 2021 02:46:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw5No3DObE2mE+WqN0XIiFCJ5HrtQ5jHSX8ANiOMEqG7fOzVuR316Lj4Q27l3DVC4y7d+sD X-Received: by 2002:aa7:c1c8:: with SMTP id d8mr19481848edp.20.1629107176346; Mon, 16 Aug 2021 02:46:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629107176; cv=none; d=google.com; s=arc-20160816; b=0oR3aZqaMVpQdiaxs/0GSokx8K+PbbfNFFkocV29cxl9PeodQZbM16nRSu19oP6ZlL LvN/cDgheaP3WZa71Zb/7yJfakIMwSfeyPbk+UiYyr/i+Hm3079sfZ7BGghXm2S6sCEc 8lGpqgc09gMqleUpd5+LJ32YKSA1gVpUhK4r7bqAkTJNpeA8QxF/QcJEnB0d+C3hEHCZ A71qHyi/OLhgp3K5UBhQRRvEer4rdkmMdCoQRkozyw+R6I73jQqlYD0C4gdoHLckXzMi ReXk4BC0gzhN5JWniQgVDSGOaw7BV2tX5+2cYZtnu6/18ZTHdz400n6mWgVu3kcZYLvS 8mlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:delivered-to; bh=GAe8nHli6ftPj3nhpU3HXEqI/GdUv6KpX8zzeZEhCs4=; b=G5mPvPUHiaYXI1zE4Tdpt7w4VNws18t13Fdhdngl5iXMdEjsD0dh/FexRl+2kLhF6U tT+t4VPY2oomB6Uaw0AEvviwL7RzynwReLaqLIXCjMMbJdR7h/JZbaynjL9OpRs4eEFe WX465E0AHlATlTGMgkKhdZwNtFwi9NAVw56gUsSc5r3Xy0oFxTW7Ba9g+pLaJvXltZP6 hwaeR4RJxN5NoW6vLlAunes9aatL7qQWxVB0ZR9mRNz9Pom218bogrt1QVV/hjg9DmCU R46qypt4rNhdSADPXr1m8n9fS90AlrhXybviJXwXf/QVFoWTyZK4vAPqWDkzV3OWlLPB BrqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=HlqlR1ml; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h8si5887598ejj.422.2021.08.16.02.46.16; Mon, 16 Aug 2021 02:46:16 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20161025 header.b=HlqlR1ml; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CE16568A4A8; Mon, 16 Aug 2021 12:46:07 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B4F1A689BC9 for ; Mon, 16 Aug 2021 12:46:01 +0300 (EEST) Received: by mail-lf1-f50.google.com with SMTP id y34so33164430lfa.8 for ; Mon, 16 Aug 2021 02:46:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FSmEUgST2sWTuMts2YCRhgzGE88y6DcdVjZj17ZQ6RU=; b=HlqlR1mlW5WecUmjhTcLYvluX2L6y5UxTdweDnXOHwOzoqM+i9LLGxd9paT1p5obIL Rw0J8V67ptCxB7LNDfv+20gSCm/4kPG/rW44oD0kfXY9Dpvy9P3x28w6vZ1vJTKVvWpL Mf+SS1J7ZUgFA/S6KcKvIlU0Mz9gckpsOhcFgUAzySdshBm0Zid3a/cFauzomrmP1u+J /vXY+V7x2Zs4vzKWCQf6o1BwKmGxMY00UVgYWNg9LiL1CNx/+G8HAaV9aY5gUtVIb2XV O19DAHxve5OZPFSUxNq+tqi+dAnAPKNjHsfIoebMQuCAKfz7l1yz5ujzS00rW1QBVgGN chEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FSmEUgST2sWTuMts2YCRhgzGE88y6DcdVjZj17ZQ6RU=; b=JxPPBZ7qunByRv2GZoiWvx1nX8nWugqPpk0viISF2TW5SBKwAFWKZIn67/cHfdJ3r1 8DejzJlMpj8Y2fU5XCojKaGWkxYetoUeos/q1/3b2O/NHxPYG5Bm8Gan22w/sEv1SQua 5AETYWYBVPtNYpSxktIa+hpleQoMLF7xIDkkruEKw11QhmNtfkBYe3M+KKWn+wCDDgFA 7+TjmImVhb+z9wx61di+VM+0nJju/g+xLoHQpjMPN6qSbxyz4Du96wQQYH7M/oV/QdUT 8WAyo5eWLJZ1P1Kp69gj1Gn+FcVuiTuQYFPLu+VP+2j7iQoQWqal7v83H2g2OySMmX60 N5uw== X-Gm-Message-State: AOAM530qkZkcrrgTVUm9Rhf/e1efE+LVEO8VUL3tFvZtLLal0LJr1vek QaSSy2TjCQvgBEfz2O345jzgkPMpHkPq3A== X-Received: by 2002:ac2:46fb:: with SMTP id q27mr11240309lfo.466.1629107160180; Mon, 16 Aug 2021 02:46:00 -0700 (PDT) Received: from localhost.localdomain ([109.195.102.12]) by smtp.gmail.com with ESMTPSA id v16sm205995lfq.87.2021.08.16.02.45.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Aug 2021 02:45:59 -0700 (PDT) From: Mikhail Nitenko To: ffmpeg-devel@ffmpeg.org Date: Mon, 16 Aug 2021 14:45:45 +0500 Message-Id: <20210816094545.448283-2-mnitenko@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816094545.448283-1-mnitenko@gmail.com> References: <20210816094545.448283-1-mnitenko@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v2 2/2] lavc/aarch64: h264, add chroma loop filters for 10bit X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Mikhail Nitenko Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: aupaG/yxGxP9 Benchmarks: A53 A72 h264_h_loop_filter_chroma422_10bpp_c: 277.5 114.2 h264_h_loop_filter_chroma422_10bpp_neon: 109.7 81.7 h264_h_loop_filter_chroma_10bpp_c: 165.0 75.5 h264_h_loop_filter_chroma_10bpp_neon: 121.2 74.7 h264_h_loop_filter_chroma_intra422_10bpp_c: 324.2 124.2 h264_h_loop_filter_chroma_intra422_10bpp_neon: 155.2 99.5 h264_h_loop_filter_chroma_intra_10bpp_c: 121.0 48.5 h264_h_loop_filter_chroma_intra_10bpp_neon: 79.5 52.7 h264_h_loop_filter_chroma_mbaff422_10bpp_c: 191.0 73.5 h264_h_loop_filter_chroma_mbaff422_10bpp_neon: 121.2 75.5 h264_h_loop_filter_chroma_mbaff_intra422_10bpp_c: 117.0 51.5 h264_h_loop_filter_chroma_mbaff_intra422_10bpp_neon: 79.5 53.7 h264_h_loop_filter_chroma_mbaff_intra_10bpp_c: 63.0 28.5 h264_h_loop_filter_chroma_mbaff_intra_10bpp_neon: 48.7 33.2 h264_v_loop_filter_chroma_10bpp_c: 260.2 135.5 h264_v_loop_filter_chroma_10bpp_neon: 72.2 49.2 h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 70.7 h264_v_loop_filter_chroma_intra_10bpp_neon: 48.7 32.0 Signed-off-by: Mikhail Nitenko --- removed leftover code, moved from 32bit and started loading with two alternating registers, code became quite a bit faster! libavcodec/aarch64/h264dsp_init_aarch64.c | 37 ++++ libavcodec/aarch64/h264dsp_neon.S | 255 ++++++++++++++++++++++ 2 files changed, 292 insertions(+) diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c index cbaf8d31eb..6bf3ecb8a1 100644 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c @@ -83,6 +83,29 @@ void ff_h264_idct8_add4_neon(uint8_t *dst, const int *block_offset, int16_t *block, int stride, const uint8_t nnzc[5 * 8]); +void ff_h264_v_loop_filter_luma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_h_loop_filter_luma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_v_loop_filter_luma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta); +void ff_h264_h_loop_filter_luma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta); +void ff_h264_v_loop_filter_chroma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_h_loop_filter_chroma_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_h_loop_filter_chroma422_neon_10(uint8_t *pix, ptrdiff_t stride, int alpha, + int beta, int8_t *tc0); +void ff_h264_v_loop_filter_chroma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma422_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); +void ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10(uint8_t *pix, ptrdiff_t stride, + int alpha, int beta); + av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, const int chroma_format_idc) { @@ -125,5 +148,19 @@ av_cold void ff_h264dsp_init_aarch64(H264DSPContext *c, const int bit_depth, c->h264_idct8_add = ff_h264_idct8_add_neon; c->h264_idct8_dc_add = ff_h264_idct8_dc_add_neon; c->h264_idct8_add4 = ff_h264_idct8_add4_neon; + } else if (have_neon(cpu_flags) && bit_depth == 10) { + c->h264_v_loop_filter_chroma = ff_h264_v_loop_filter_chroma_neon_10; + c->h264_v_loop_filter_chroma_intra = ff_h264_v_loop_filter_chroma_intra_neon_10; + + if (chroma_format_idc <= 1) { + c->h264_h_loop_filter_chroma = ff_h264_h_loop_filter_chroma_neon_10; + c->h264_h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma_intra_neon_10; + c->h264_h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10; + } else { + c->h264_h_loop_filter_chroma = ff_h264_h_loop_filter_chroma422_neon_10; + c->h264_h_loop_filter_chroma_mbaff = ff_h264_h_loop_filter_chroma_neon_10; + c->h264_h_loop_filter_chroma_intra = ff_h264_h_loop_filter_chroma422_intra_neon_10; + c->h264_h_loop_filter_chroma_mbaff_intra = ff_h264_h_loop_filter_chroma_intra_neon_10; + } } } diff --git a/libavcodec/aarch64/h264dsp_neon.S b/libavcodec/aarch64/h264dsp_neon.S index 997082498f..80b7ed5ce1 100644 --- a/libavcodec/aarch64/h264dsp_neon.S +++ b/libavcodec/aarch64/h264dsp_neon.S @@ -819,3 +819,258 @@ endfunc weight_func 16 weight_func 8 weight_func 4 + +.macro h264_loop_filter_start_10 + cmp w2, #0 + ldr w6, [x4] + ccmp w3, #0, #0, ne + lsl w2, w2, #2 + mov v24.S[0], w6 + lsl w3, w3, #2 + and w8, w6, w6, lsl #16 + b.eq 1f + ands w8, w8, w8, lsl #8 + b.ge 2f +1: + ret +2: +.endm + +.macro h264_loop_filter_start_intra_10 + orr w4, w2, w3 + cbnz w4, 1f + ret +1: + lsl w2, w2, #2 + lsl w3, w3, #2 + dup v30.8h, w2 // alpha + dup v31.8h, w3 // beta +.endm + +.macro h264_loop_filter_chroma_10 + dup v22.8h, w2 // alpha + dup v23.8h, w3 // beta + uxtl v24.8h, v24.8b // tc0 + + uabd v26.8h, v16.8h, v0.8h // abs(p0 - q0) + uabd v28.8h, v18.8h, v16.8h // abs(p1 - p0) + uabd v30.8h, v2.8h, v0.8h // abs(q1 - q0) + cmhi v26.8h, v22.8h, v26.8h // < alpha + cmhi v28.8h, v23.8h, v28.8h // < beta + cmhi v30.8h, v23.8h, v30.8h // < beta + + and v26.16b, v26.16b, v28.16b + mov v4.16b, v0.16b + sub v4.8h, v4.8h, v16.8h + and v26.16b, v26.16b, v30.16b + shl v4.8h, v4.8h, #2 + mov x8, v26.d[0] + mov x9, v26.d[1] + sli v24.8H, v24.8H, #8 + uxtl v24.8H, v24.8B + add v4.8h, v4.8h, v18.8h + shl v24.8h, v24.8h, #2 + + adds x8, x8, x9 + b.eq 9f + + movi v31.8h, #3 // (tc0 - 1) << (BIT_DEPTH - 8)) + 1 + uqsub v24.8h, v24.8h, v31.8h + sub v4.8h , v4.8h, v2.8h + srshr v4.8h, v4.8h, #3 + smin v4.8h, v4.8h, v24.8h + neg v25.8h, v24.8h + smax v4.8h, v4.8h, v25.8h + and v4.16B, v4.16B, v26.16B + add v16.8h, v16.8h, v4.8h + sub v0.8h, v0.8h, v4.8h + + mvni v4.8h, #0xFC, lsl #8 // 1023 for clipping + movi v5.8h, #0 + smin v0.8h, v0.8h, v4.8h + smax v16.8h, v16.8h, v5.8h + smax v0.8h, v0.8h, v5.8h + smin v16.8h, v16.8h, v4.8h +.endm + +function ff_h264_v_loop_filter_chroma_neon_10, export=1 + h264_loop_filter_start_10 + + mov x10, x0 + sub x0, x0, x1, lsl #1 + ld1 {v18.8h}, [x0 ], x1 + ld1 {v0.8h}, [x10], x1 + ld1 {v16.8h}, [x0 ], x1 + ld1 {v2.8h}, [x10] + + h264_loop_filter_chroma_10 + + sub x0, x10, x1, lsl #1 + st1 {v16.8h}, [x0], x1 + st1 {v0.8h}, [x0], x1 +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_neon_10, export=1 + h264_loop_filter_start_10 + + sub x0, x0, #4 // access the 2nd left pixel +h_loop_filter_chroma420_10: + add x10, x0, x1, lsl #2 + ld1 {v18.d}[0], [x0 ], x1 + ld1 {v18.d}[1], [x10], x1 + ld1 {v16.d}[0], [x0 ], x1 + ld1 {v16.d}[1], [x10], x1 + ld1 {v0.d}[0], [x0 ], x1 + ld1 {v0.d}[1], [x10], x1 + ld1 {v2.d}[0], [x0 ], x1 + ld1 {v2.d}[1], [x10], x1 + + transpose_4x8H v18, v16, v0, v2, v28, v29, v30, v31 + + h264_loop_filter_chroma_10 + + transpose_4x8H v18, v16, v0, v2, v28, v29, v30, v31 + + sub x0, x10, x1, lsl #3 + st1 {v18.d}[0], [x0], x1 + st1 {v16.d}[0], [x0], x1 + st1 {v0.d}[0], [x0], x1 + st1 {v2.d}[0], [x0], x1 + st1 {v18.d}[1], [x0], x1 + st1 {v16.d}[1], [x0], x1 + st1 {v0.d}[1], [x0], x1 + st1 {v2.d}[1], [x0], x1 +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma422_neon_10, export=1 + h264_loop_filter_start_10 + add x5, x0, x1 + sub x0, x0, #4 + add x1, x1, x1 + mov x7, x30 + bl h_loop_filter_chroma420_10 + mov x30, x7 + sub x0, x5, #4 + mov v24.s[0], w6 + b h_loop_filter_chroma420_10 +endfunc + +.macro h264_loop_filter_chroma_intra_10 + uabd v26.8h, v16.8h, v17.8h // abs(p0 - q0) + uabd v27.8h, v18.8h, v16.8h // abs(p1 - p0) + uabd v28.8h, v19.8h, v17.8h // abs(q1 - q0) + cmhi v26.8h, v30.8h, v26.8h // < alpha + cmhi v27.8h, v31.8h, v27.8h // < beta + cmhi v28.8h, v31.8h, v28.8h // < beta + and v26.16b, v26.16b, v27.16b + and v26.16b, v26.16b, v28.16b + mov x2, v26.d[0] + mov x3, v26.d[1] + + shl v4.8h, v18.8h, #1 + shl v6.8h, v19.8h, #1 + + adds x2, x2, x3 + b.eq 9f + + add v20.8h, v16.8h, v19.8h + add v22.8h, v17.8h, v18.8h + add v20.8h, v20.8h, v4.8h + add v22.8h, v22.8h, v6.8h + urshr v24.8h, v20.8h, #2 + urshr v25.8h, v22.8h, #2 + bit v16.16b, v24.16b, v26.16b + bit v17.16b, v25.16b, v26.16b +.endm + +function ff_h264_v_loop_filter_chroma_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + mov x9, x0 + sub x0, x0, x1, lsl #1 + ld1 {v18.8h}, [x0], x1 + ld1 {v17.8h}, [x9], x1 + ld1 {v16.8h}, [x0], x1 + ld1 {v19.8h}, [x9] + + h264_loop_filter_chroma_intra_10 + + sub x0, x9, x1, lsl #1 + st1 {v16.8h}, [x0], x1 + st1 {v17.8h}, [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_mbaff_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + + sub x4, x0, #4 + sub x0, x0, #2 + add x9, x4, x1, lsl #1 + ld1 {v18.8h}, [x4], x1 + ld1 {v17.8h}, [x9], x1 + ld1 {v16.8h}, [x4], x1 + ld1 {v19.8h}, [x9], x1 + + transpose_4x8H v18, v16, v17, v19, v26, v27, v28, v29 + + h264_loop_filter_chroma_intra_10 + + st2 {v16.h,v17.h}[0], [x0], x1 + st2 {v16.h,v17.h}[1], [x0], x1 + st2 {v16.h,v17.h}[2], [x0], x1 + st2 {v16.h,v17.h}[3], [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + sub x4, x0, #4 + sub x0, x0, #2 +h_loop_filter_chroma420_intra_10: + add x9, x4, x1, lsl #2 + ld1 {v18.4h}, [x4], x1 + ld1 {v18.d}[1], [x9], x1 + ld1 {v16.4h}, [x4], x1 + ld1 {v16.d}[1], [x9], x1 + ld1 {v17.4h}, [x4], x1 + ld1 {v17.d}[1], [x9], x1 + ld1 {v19.4h}, [x4], x1 + ld1 {v19.d}[1], [x9], x1 + + transpose_4x8H v18, v16, v17, v19, v26, v27, v28, v29 + + h264_loop_filter_chroma_intra_10 + + st2 {v16.h,v17.h}[0], [x0], x1 + st2 {v16.h,v17.h}[1], [x0], x1 + st2 {v16.h,v17.h}[2], [x0], x1 + st2 {v16.h,v17.h}[3], [x0], x1 + st2 {v16.h,v17.h}[4], [x0], x1 + st2 {v16.h,v17.h}[5], [x0], x1 + st2 {v16.h,v17.h}[6], [x0], x1 + st2 {v16.h,v17.h}[7], [x0], x1 + +9: + ret +endfunc + +function ff_h264_h_loop_filter_chroma422_intra_neon_10, export=1 + h264_loop_filter_start_intra_10 + sub x4, x0, #4 + add x5, x0, x1, lsl #3 + sub x0, x0, #2 + mov x7, x30 + bl h_loop_filter_chroma420_intra_10 + mov x4, x9 + sub x0, x5, #2 + mov x30, x7 + b h_loop_filter_chroma420_intra_10 +endfunc