From patchwork Mon Mar 25 15:02:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Martin_Storsj=C3=B6?= X-Patchwork-Id: 35088 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c889:b0:1a3:b6bb:3029 with SMTP id hb9csp1246173pzb; Mon, 25 Mar 2024 08:02:55 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVTHeCzCWTJusJlF2BfF0Y9vREf0XDEBOphX7Oe14GAhnNoiHyBIgHM69SzBrgMt27Dx9sV4xTAuLhKbUgC3vCafd6JEKKJHu5Utw== X-Google-Smtp-Source: AGHT+IGNVtQqDcG2DpZrRKoTP2xTNo4kt899IO0boE7M59wTjvG/OI7N1bzVmTRYYR9t9GYwlNO+ X-Received: by 2002:a50:9344:0:b0:56b:d0d8:a00a with SMTP id n4-20020a509344000000b0056bd0d8a00amr27790eda.33.1711378974935; Mon, 25 Mar 2024 08:02:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1711378974; cv=none; d=google.com; s=arc-20160816; b=p4v0Z3V+Bu+fFfk5jdf5bqyWiU/0o9VzpHCyhxe3pASrUV0K/gCnRzudMevCmW7tSc +rfCqJi0qgphgWOyaJPa+J2XytauOborWvfCHL/VHenD4Revbh9rsn9VlVDmi0nYGBaB zVVOWVxhByqfN4O9q/7mwKk1kfFSh1PjGzB6iiW/mqev1081KE5Fz09cYuz0WJ+H23yi quYtUZTcuu4xeKkE2L4j8HcE2rL2FJI7VEdrmEJPPj9GNUEInxbDr+KG2/0n6G053ksp rLutOhF6wdK/U39+2YU1iwce32CHwxc6nUHMDPYYUhZRWmf2lxNZxSdNEbTXmChHnMWH kP4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=E0ftjjcpPaoFR6g9JWFjqMXWZMydCBYxFYIGS+iXT7k=; fh=DRs4GoYGH1Ee5ERfKSLc+LBtFKtorfi5XrD3r13LPeM=; b=msrnuHMLKivCz7kv63S6Ej4PTEZ6Gkho8SU2b6RJFxGZ5khTfLITjXXrBKqmF97ie9 LpO9wG5gHQ92Jy52jlUEenpOI8x+pWWF5wjNCHw2p4GjzjefQH2VQDmDlXndGu4ml2px uFJoEgDjCP3c00DBXgV6NZZVmmK/6YPZOulMt65D1GHZal5rZ0pbgHqFAT7e3Z2vGZpb hj3GbmJljvpdO/adkPNK/orireR+jQVnZbcsarWRlcHU3CaEoj/rwnvSmRvIPOvbBcnE fiyG4BXFjhNc3eJa0C7pFjj3A1XUi6t7+vDQlAd6xHQxjtpa1VAlsukybp8U2b8ad3iO Gz1g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=oi7HXvKO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id d15-20020a056402400f00b0056c1a64ad8fsi448350eda.244.2024.03.25.08.02.54; Mon, 25 Mar 2024 08:02:54 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@martin-st.20230601.gappssmtp.com header.s=20230601 header.b=oi7HXvKO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A5C1B68D442; Mon, 25 Mar 2024 17:02:51 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3AC7E68D3BE for ; Mon, 25 Mar 2024 17:02:45 +0200 (EET) Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-512e4f4e463so4755994e87.1 for ; Mon, 25 Mar 2024 08:02:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378964; x=1711983764; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CqQmIAxNLo5O7L/jtCo0jFwloW665QpP0i7G9KogrkE=; b=oi7HXvKOlcZufuK1htyz238bmrhf4hWN08kRjfIg/y333x82qux9RwvCl8S/y5z9NX K42gSmObIqDZgeYsunsvfyYR2Alt4S+n2QKET+f8hQ+Tx/ZUejm2Xoqgg7Rk4UM8bDc9 TGJZQl3a91dK3Fr7WerAPhfYZkZpP6qwI1l6ZEADNqnwP9bpK1jMdG6P+Av+ysJPKF6J fIX7JTpKYwVzgXFb80VuRNG9N6ZhND5Bdeb9NDQMHzTAQmxJJQ5s5HVQB7T8nZrJzRNI jS5Z/Z8Lglm+BA/cJXrLnoLeMhyHkvhMn+Q+2Ch2rtdrTKdMDUUejYRLFOcU/3Fg4BKh wEvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378964; x=1711983764; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CqQmIAxNLo5O7L/jtCo0jFwloW665QpP0i7G9KogrkE=; b=UtPsWjkm2wTy48rf4msPdIpWDLaSCCKkARA5v+65Ijdf+E4Nut8hFB7zoFHg3N3DKe sqW1EKiCdhiVu5mlxWO8e101Vjszd+ZXAuJQYI0KEjgwYbUX5tp2jk5UwccupjqaUW1B HyGTHydb0uerTN76nCzfWhfgSJJs0lLuSu75WaQQCEESTkmPsEb6AlotlQrouqPLF7WI Rcwerj0qAIz875vXkqqiyy5B06+CzdSitKQzcL5Y9v0w19Rtb1tR9U/FVt4ETfS+0vUu E8Q5xyRggeX/oDIIh+G3vaDBNVap9DTFsT43VfGkz4UjlwWsIHE0PlDgFDiFm9/DOllx /SUA== X-Gm-Message-State: AOJu0Ywb9Zck2oRXcrmK+OrPEmt/9ZIJWYqWX9jaWyfw5pz5FfEhVoEU yWslvZKzbFydliMNRywK87YtmZFkz//LwEfRUu/IwYUyoZQIpYqClZVHWot3tBqsoT3xxJFzYRp vXB1X X-Received: by 2002:a19:2d14:0:b0:513:cc88:e1b with SMTP id k20-20020a192d14000000b00513cc880e1bmr83482lfj.61.1711378964245; Mon, 25 Mar 2024 08:02:44 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id o22-20020a056512231600b00513ea009dfesm1095408lfu.168.2024.03.25.08.02.43 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:43 -0700 (PDT) From: =?utf-8?q?Martin_Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:22 +0200 Message-Id: <20240325150243.59058-1-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: fZsAd5NbeF0t Hi, Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. // Martin Martin Storsjö (21): aarch64: hevc: Reorder a misplaced function init line aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 aarch64: hevc: Split the epel_*_hv functions into two parts aarch64: hevc: Reorder epel_hv functions to prepare for templating aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 aarch64: hevc: Split the qpel_*_hv functions into two parts aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions aarch64: hevc: Reorder qpel_hv functions to prepare for templating aarch64: hevc: Produce plain neon versions of qpel_hv aarch64: hevc: Produce plain neon versions of qpel_uni_hv aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv aarch64: hevc: Produce plain neon versions of qpel_bi_hv libavcodec/aarch64/hevcdsp_epel_neon.S | 1529 +++++++++++------ libavcodec/aarch64/hevcdsp_init_aarch64.c | 96 +- libavcodec/aarch64/hevcdsp_qpel_neon.S | 1804 +++++++++++++-------- 3 files changed, 2291 insertions(+), 1138 deletions(-)