From patchwork Thu Jul 25 15:53:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 50732 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:cc0a:0:b0:482:c625:d099 with SMTP id h10csp642024vqv; Thu, 25 Jul 2024 08:53:47 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXrKiz/YMVTHafBtIwwdWaTTlbNHhBwG/P5C/tozCi1l4IQSN2rDY/gbt6PCevsSw6LbGtgHz7BAAWYBGwN2dHaR72coB6w1Dz5yA== X-Google-Smtp-Source: AGHT+IHeYH2AKa8kW+nyRG6fJ3gw2A1u8tStHEaxJ7NljI1JSDMjQtxpzho1ju1t1ZO5g6L+w0eN X-Received: by 2002:a17:907:7f86:b0:a77:e48d:bc3 with SMTP id a640c23a62f3a-a7ac4de3b26mr243050066b.19.1721922827042; Thu, 25 Jul 2024 08:53:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1721922827; cv=none; d=google.com; s=arc-20160816; b=ofTwWgNgzoilzED/l8wWy2p9Le7siJ7sKCFzteWcwVmKXqAOhRcKkThSp3MCYlA+4k qR1fKeuUxKHASQ+hToV992CyjIn3tN2Zv9sVtXzxnPn0VhycVkjNzVP/gSmuc4mLkSKl xlCeWvwiQjzvQW+qhpLXGD0X2VPzDdnBXqq71LhebxKvpMZHzG4L7JQFc1fLSysm9M8F 35t6AQAg9UbdwMqBfx3EF6ogi7pf4K51aYsPL6im+ypLaJInL2G9YigLQp6oE3kEzZiu qUSxSi5rQ55jvGJmemVD2SIUX45uV3X+m4fUabOs/HU6vTeNncz3+2kwU9x0MX18f963 eBAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=crPuy/9biZ5+IerokPSzQfzTKQNe/oFDbQI33hlzECI=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=UarXhD1FMdUDymU3npZ8hmgxLcv169IpqQAOQERcw+yXbnD2qlVbUEdH+PnxBoxr5w sXbBm4DgsQiFf64eUk8YE4UaDJ0waha7jhevNz9WSHo0VOHpwevG5DWV9Jc8e6u6JiR1 VvjU1TzbzuPLkep11eMjnQIVlFB7yhNHrVdIC34C7fhlRjCG3563QLUSqikUZsOGzgrH c6k/qYuN9PVPsb9BqUeYeJXi75/6a4eLkxriou7T6WEZQL53OKT+C5aUXu4H9DA4fzP6 3xsDBOTNcrFlLBeX0IdChUm6wZ11RciGtnzBtm+XmiUaZLIC5LoaUZN4Z/ppGsuUEFTW clFw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a7aca7b764fsi114146166b.69.2024.07.25.08.53.46; Thu, 25 Jul 2024 08:53:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BD96F68D654; Thu, 25 Jul 2024 18:53:43 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EA2EB68D654 for ; Thu, 25 Jul 2024 18:53:36 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 6F812C013B for ; Thu, 25 Jul 2024 18:53:36 +0300 (EEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 25 Jul 2024 18:53:36 +0300 Message-ID: <20240725155336.37121-1-remi@remlab.net> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9V8bbOspii/t The current code assumes that we have unaligned rows, which hurts on platforms with slower unaligned accesses. (Also, this lets the compiler unroll manually, which it seems to do in practice.) --- libavcodec/pixblockdsp.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c index bbbeca1618..1fff244511 100644 --- a/libavcodec/pixblockdsp.c +++ b/libavcodec/pixblockdsp.c @@ -26,6 +26,13 @@ static void get_pixels_16_c(int16_t *restrict block, const uint8_t *pixels, ptrdiff_t stride) +{ + for (int i = 0; i < 8; i++) + AV_COPY128(block + i * 8, pixels + i * stride); +} + +static void get_pixels_unaligned_16_c(int16_t *restrict block, + const uint8_t *pixels, ptrdiff_t stride) { AV_COPY128U(block + 0 * 8, pixels + 0 * stride); AV_COPY128U(block + 1 * 8, pixels + 1 * stride); @@ -90,7 +97,7 @@ av_cold void ff_pixblockdsp_init(PixblockDSPContext *c, AVCodecContext *avctx) case 10: case 12: case 14: - c->get_pixels_unaligned = + c->get_pixels_unaligned = get_pixels_unaligned_16_c; c->get_pixels = get_pixels_16_c; break; default: