From patchwork Wed Apr 17 18:01:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 48110 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:ce4e:b0:1a9:af23:56c1 with SMTP id id14csp1189457pzb; Wed, 17 Apr 2024 11:02:09 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW8TJCifwnXcIgU5/A+bJJFxIWqWEJYm6t64cIuKXYeHEtqwGpcB4/gHahXWQvqZLLd35ZDCYlTxrVeqlrO0jE7wHs1t7JLQSWBbw== X-Google-Smtp-Source: AGHT+IHld3zOpxXkuRwKDwSRxcD3cfZkV//uLLnzzP4poTe6HXA55zDKGsBz1ES3Gx5PtDXaF+Vn X-Received: by 2002:a17:906:3b14:b0:a55:201f:4b62 with SMTP id g20-20020a1709063b1400b00a55201f4b62mr210929ejf.0.1713376929516; Wed, 17 Apr 2024 11:02:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713376929; cv=none; d=google.com; s=arc-20160816; b=dbGPiKvMAWsx6I8HB7rUzjvpEonRPNC2TJIyK+C7Mi4SjXTw214AQCeF6JdIYvPcVF 6ntmSapX2PAxqL/9PG6San5xh86I63UL6yCSy2a9l3YskmwpwxlwBEJy/uekPU5dcKjm v3b3vdyW1PFIqaZN/ia7czVilFOf2I+n1kjEjGwfgdbItjXmTKiZ6I3/xlGwPgpkR+Rx T0Ie6vj27Mudfqxbw7UnVSrj+WkQaTH36mCRUI6avKNd/znsGB9wUsXNmwRAVldqq/OC iYIPj112RWsDTR84wO0IMW1pm4nYMHB4wyaUPl35vtHciDDfLwcaICrg1B3Wyl8muNUI SHGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=HVYBQzVZAsBPZcKOZHLknbmHa2C+R7dwJkz7TiRlwus=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=BWaCqg9U0tGLGbQTkGo6SvX2sAbNqaSsMxJ/THEdqMwO2pUsILygrKMyCX25UbR8PJ JMaYIKzIzDEEd7wCZjQb+EZdWa82gEMMvKsVq7pz1kw612ydhjGyD/VzXpek9OghIhRJ kp6vDInEX7Nvpc+VEpn+Z8lBUtIRY80c65rhiPvYZeFC2yhkBggoO1tzx7GZVMnpVPLK OQTu2H87PxyQNCzOjnKyrB5mOxMRikvO7DogVhToi+gz/857qoJ/qURYWy8sM5I2v/Er WRpO0T90q8TKskJ3NYRkWrL/IENh8iPdtCvC52fKd3jJwMuo/UKGjO8BcKrpYyG7L+fp 2W6Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=llRNhfRk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id c5-20020a17090603c500b00a51c35ffb75si6926173eja.381.2024.04.17.11.02.09; Wed, 17 Apr 2024 11:02:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=llRNhfRk; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CFAF668D350; Wed, 17 Apr 2024 21:01:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A051568BF8B for ; Wed, 17 Apr 2024 21:01:52 +0300 (EEST) Received: by mail-ot1-f43.google.com with SMTP id 46e09a7af769-6eb55942409so3484933a34.1 for ; Wed, 17 Apr 2024 11:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713376909; x=1713981709; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=dJ0dFuHbEOnkslW6mzEJZT5yAxZFEk2bNgxFRFyQXVo=; b=llRNhfRkomEvzV6/1nz7QIO8umAOJZVbbQa5Zn/LX0TfOPW5FJJ/Tni4D15bPuuhNP TmAtD7rdgQROu+/LFkz1WQFlmuNcjQmL0Es31x368nhY9cQSMVaR2m/coYgeVa3WYefM wD1a13PAR7IKxLgvw7xKTzuaUx+zG4Foii3Suy6rE8eOtGd5gw/Cq0Wp71Zu20wiwRD4 HXResK0SO5cLm1mnmxghlweXO7uyvzPICnE/RJTpXOdDpvWZeaMOwe8uwI9/cyCl/Mnd pc/1epyjmg5lKqweF2J5Af/txgrQLkMC8NI0xF62TlxCiIp/33MOs5iAbzPFyCYo6h1x Dj5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713376909; x=1713981709; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dJ0dFuHbEOnkslW6mzEJZT5yAxZFEk2bNgxFRFyQXVo=; b=DE8JpBkiIKQo2gcK1dr95UIg+Mc7m7w2oDQfGKv5FsQskmsGNveI9BW4q7/DF93JCO 8e6xW3XKhiMvEmluf1HjUpEK6SlrVWIdMczu8GBqjr4ca5V2SokJ3h4D5elR5eNybhnr EtBNspldaDoetZg9qpehvDztHpjZ461KikpyYqAdeP32N9H43EEdeaMjv9F1EGM1SOmf IR6wL+YL9F5PEaJiB6almsqzXZ1GDOFWCBaQBVFHkyrG4+T6WWByNTv2IgwX9Xdkggk/ B8FK4Xwcg2H3YMAzDdSLqAxw/qdfF4x2zf5vqIygGI6dAEqUA1285EPIf42ENNigdbZQ mlLA== X-Gm-Message-State: AOJu0YyB0qeY4F8gq/BVeTOpazmBsMl39NiC6rNOoGAWPZAgbkHXkQhA JuemFLIMNII4IrmQLbbqbYrcMCdYOL842rZ5WtLviQC65Vh+0D6urQf6fK8E X-Received: by 2002:a05:6870:3119:b0:22e:7de8:c745 with SMTP id v25-20020a056870311900b0022e7de8c745mr304300oaa.56.1713376909617; Wed, 17 Apr 2024 11:01:49 -0700 (PDT) Received: from localhost.localdomain ([206.0.71.7]) by smtp.gmail.com with ESMTPSA id n8-20020a9d64c8000000b006e695048ad8sm2791141otl.66.2024.04.17.11.01.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 11:01:49 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Wed, 17 Apr 2024 20:01:37 +0200 Message-Id: <20240417180138.21864-2-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240417180138.21864-1-ramiro.polla@gmail.com> References: <20240417180138.21864-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 9LuQ7/Tl7hiE Reviewed-by: Martin Storsjö --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 +++++++++++++++++++++++++++++++++++++++ tests/fate/checkasm.mak | 1 + 5 files changed, 74 insertions(+) create mode 100644 tests/checkasm/fdctdsp.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 2673e1d098..70a6120c70 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP) += ac3dsp.o AVCODECOBJS-$(CONFIG_AUDIODSP) += audiodsp.o AVCODECOBJS-$(CONFIG_BLOCKDSP) += blockdsp.o AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o +AVCODECOBJS-$(CONFIG_FDCTDSP) += fdctdsp.o AVCODECOBJS-$(CONFIG_FMTCONVERT) += fmtconvert.o AVCODECOBJS-$(CONFIG_G722DSP) += g722dsp.o AVCODECOBJS-$(CONFIG_H264CHROMA) += h264chroma.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 8be6cb0f55..92c3a30ad3 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -106,6 +106,9 @@ static const struct { #if CONFIG_EXR_DECODER { "exrdsp", checkasm_check_exrdsp }, #endif + #if CONFIG_FDCTDSP + { "fdctdsp", checkasm_check_fdctdsp }, + #endif #if CONFIG_FLAC_DECODER { "flacdsp", checkasm_check_flacdsp }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index f90920dee7..d3e8f9a37a 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void); void checkasm_check_bswapdsp(void); void checkasm_check_colorspace(void); void checkasm_check_exrdsp(void); +void checkasm_check_fdctdsp(void); void checkasm_check_fixed_dsp(void); void checkasm_check_flacdsp(void); void checkasm_check_float_dsp(void); diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c new file mode 100644 index 0000000000..68a9b5e435 --- /dev/null +++ b/tests/checkasm/fdctdsp.c @@ -0,0 +1,68 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "checkasm.h" + +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" + +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/mem_internal.h" + +static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, size_t n, int accuracy) +{ + for (size_t i = 0; i < n; i++) { + if (abs(ref[i] - test[i]) > accuracy) + return 1; + } + return 0; +} + +static void check_fdct(void) +{ + LOCAL_ALIGNED_16(int16_t, block0, [64]); + LOCAL_ALIGNED_16(int16_t, block1, [64]); + + AVCodecContext avctx = { 0 }; + FDCTDSPContext h; + + ff_fdctdsp_init(&h, &avctx); + + if (check_func(h.fdct, "fdct")) { + declare_func(void, int16_t *); + for (int i = 0; i < 64; i++) { + uint8_t r = rnd(); + block0[i] = r; + block1[i] = r; + } + call_ref(block0); + call_new(block1); + if (int16_cmp_off_by_n(block0, block1, 64, 2)) + fail(); + bench_new(block1); + } +} + +void checkasm_check_fdctdsp(void) +{ + check_fdct(); + report("fdctdsp"); +} diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak index 3b5b867a97..10a42f2f9d 100644 --- a/tests/fate/checkasm.mak +++ b/tests/fate/checkasm.mak @@ -8,6 +8,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp \ fate-checkasm-blockdsp \ fate-checkasm-bswapdsp \ fate-checkasm-exrdsp \ + fate-checkasm-fdctdsp \ fate-checkasm-fixed_dsp \ fate-checkasm-flacdsp \ fate-checkasm-float_dsp \ From patchwork Wed Apr 17 18:01:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramiro Polla X-Patchwork-Id: 48111 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:ce4e:b0:1a9:af23:56c1 with SMTP id id14csp1189613pzb; Wed, 17 Apr 2024 11:02:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVo0xEo/I/rvEwhGV4AerAWvezw6hTWStAYcvtKcWQrNQx0vmtUe3h5ZODEZT0L4l9hLWy3Ab9YUHaw9sm4synBwfiMQlx5go+Dsg== X-Google-Smtp-Source: AGHT+IHMH9KAtz4mTznWEG9nwVCknFzPar7NdBbtzJVZ/qJkCafnNu6ZEiRmpGMGwgp5szDiaps4 X-Received: by 2002:a17:906:1406:b0:a52:431e:636a with SMTP id p6-20020a170906140600b00a52431e636amr149462ejc.7.1713376939296; Wed, 17 Apr 2024 11:02:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713376939; cv=none; d=google.com; s=arc-20160816; b=IJCI9M4TlL6NbMuR4ca/kIbs180e8RAagDkRCkuukBP0ODEpxRI32P7m0zd71TFG2F b3bk3LYwmJsrqOPkRkfers2zkm62HzRqJ0F+PijhYZaV4KBsXajnfouKgp5Rrc3/Uggq m3CnNiE0EpN8PskzkF1Pm/QcI60Fy758gWX/QBSL/Pi4DMKoBig+HC5xpYItRba+pL9U rc0Art3+ONFQ9c4RUPUpx+A7fvvVHTLYpj0BAV2YO4vFzHZB6JTiOnuZXvHXUBhFayMD kwumYnZHUB22Hh5aHLKqRirnhvbjL9ghMEZl3vDhkITDg1B6kIHWeLjRBgppy7+TSprA m4MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:dkim-signature:delivered-to; bh=h7NX7VoD4xZSYf//k5y9C0mUERk8ida7KPB0+ojJP1Q=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=wRwCLzRmlrDtQAFGIuPoo9q89JEFJbseXkwf9wRRMMtZzxVgVwVpH6J2F0tgaVHZtw so8qdCTmiMH20X/aWvNNpKnmzdpI0QMOhmWGRbeejUxgAPj7EajwWjGaDVee7c6t7m1A 956/+I72Kr8cgbfnN+7LCqkIllV4wAPfuuPFCQkYd6/fhjhnaxXk2Yg5D4tn4froir4x BhXBdQ27SGIJz+jqrURwIu5rKY4vzwp8WJkoA3ucfZzavihQvlwsBa3Wq6FqRforFtkB CjwT8QWIAYkED5VjrOkbCwfK1MJMXHOuKCetocgj5Uju9i+HUXm1Okwh7ET2Yhf0TXV7 4LUw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lExM9ncu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id u16-20020a170906655000b00a52122ea69bsi6784541ejn.147.2024.04.17.11.02.18; Wed, 17 Apr 2024 11:02:19 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20230601 header.b=lExM9ncu; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 19F5368D357; Wed, 17 Apr 2024 21:02:03 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 0D18968D23A for ; Wed, 17 Apr 2024 21:01:57 +0300 (EEST) Received: by mail-ot1-f43.google.com with SMTP id 46e09a7af769-6ea2ac0c217so22072a34.1 for ; Wed, 17 Apr 2024 11:01:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713376915; x=1713981715; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=hpJdUkjgLREnENPCSmsItdwx+TC04vMu/mCj4hAGpvQ=; b=lExM9ncucxEJeTTd9/pUma9zBl07dblRWw/z5KeHb2Z6/QCy5P9bBu/6U+YdA2U5jR AFHovVVAZc6BCrLOw/ULRykdKZG6P4TVpEu6J3NelKhueMb7JACTTQFznLKwVho8ENeI vFgemEdWWuJK3njwFYXUhnhv7nZKHSRg/n9dvb/+bbPlajaS5V0X+ISjX5HsbbZsfLT0 G8B7+QpgjpBc6iq59hzUOVEUVhaEy8NExNgGqG6I4tiGMt/mgqif/hSFXfPAcZdiuoKy tHUlGKiExg0FWwCAzE7c4Oc59XxQnCgVA/RspgyaVRWuH0Q2cJnLOyyuNb/u2YfEh+ka 4nKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713376915; x=1713981715; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hpJdUkjgLREnENPCSmsItdwx+TC04vMu/mCj4hAGpvQ=; b=wlQ6U93EdsZPVrVv3LP7BUE/JayljFhVw285VON8fi6e7JArXkFi/BIzRX0T+utXuG QAutIcgbz/X9WfB25uVtBAqUurcPkGQ+ogwtVIZirSJgZ8OBSE6beq0baC9/2+oiLbJW zpY0jm7EEoIEDp/PjgTWn4fnqa7rxbSWBd8nsOZ6PXpHratvw7m/pgulEE9FR9wn4TlT 6SnhqCLya8m8SSb2lyM0qL/T+x2rctbC/p1sp4At/5AaFWb4T+BH9ntinK3iy5mkrUnb hmqJowl4+hJVDjkHYd5LHnoJxuuycfbAbh4A5/rNIi7/4oL6zaBs/Wn7iKrvrSmiFIts AF7A== X-Gm-Message-State: AOJu0YyaJBqkvJ74vqxNwcJq/OlZkm/B+KzOd8hipboHDJ3QpgA03I0t QlylTC0rbNmwo39yoHzVxUgTn15nVyARsrbwCoLxtRNU+M+abdpac7hNZQma X-Received: by 2002:a05:6830:16d2:b0:6ea:2279:517c with SMTP id l18-20020a05683016d200b006ea2279517cmr56192otr.18.1713376914115; Wed, 17 Apr 2024 11:01:54 -0700 (PDT) Received: from localhost.localdomain ([206.0.71.7]) by smtp.gmail.com with ESMTPSA id n8-20020a9d64c8000000b006e695048ad8sm2791141otl.66.2024.04.17.11.01.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 11:01:53 -0700 (PDT) From: Ramiro Polla To: ffmpeg-devel@ffmpeg.org Date: Wed, 17 Apr 2024 20:01:38 +0200 Message-Id: <20240417180138.21864-3-ramiro.polla@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240417180138.21864-1-ramiro.polla@gmail.com> References: <20240417180138.21864-1-ramiro.polla@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: zDJmsWAecKg8 The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. Reviewed-by: Martin Storsjö --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++++++++++++++++++++++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h | 1 + libavcodec/tests/aarch64/dct.c | 2 + 9 files changed, 444 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 95ad4dd202..a3256bb1cc 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -1,5 +1,6 @@ # subsystems OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o +OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o @@ -37,6 +38,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp.o # subsystems NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o NEON-OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_neon.o +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o \ diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h new file mode 100644 index 0000000000..0901b53a83 --- /dev/null +++ b/libavcodec/aarch64/fdct.h @@ -0,0 +1,26 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_FDCT_H +#define AVCODEC_AARCH64_FDCT_H + +#include + +void ff_fdct_neon(int16_t *block); + +#endif /* AVCODEC_AARCH64_FDCT_H */ diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c b/libavcodec/aarch64/fdctdsp_init_aarch64.c new file mode 100644 index 0000000000..59d91bc8fc --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c @@ -0,0 +1,39 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/aarch64/cpu.h" +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" +#include "fdct.h" + +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx, + unsigned high_bit_depth) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_neon(cpu_flags)) { + if (!high_bit_depth) { + if (avctx->dct_algo == FF_DCT_AUTO || + avctx->dct_algo == FF_DCT_NEON) { + c->fdct = ff_fdct_neon; + } + } + } +} diff --git a/libavcodec/aarch64/fdctdsp_neon.S b/libavcodec/aarch64/fdctdsp_neon.S new file mode 100644 index 0000000000..53fa4debe5 --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_neon.S @@ -0,0 +1,368 @@ +/* + * Armv8 Neon optimizations for libjpeg-turbo + * + * Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies). + * All Rights Reserved. + * Author: Siarhei Siamashka + * Copyright (C) 2013-2014, Linaro Limited. All Rights Reserved. + * Author: Ragesh Radhakrishnan + * Copyright (C) 2014-2016, 2020, D. R. Commander. All Rights Reserved. + * Copyright (C) 2015-2016, 2018, Matthieu Darbois. All Rights Reserved. + * Copyright (C) 2016, Siarhei Siamashka. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#include "libavutil/aarch64/asm.S" +#include "neon.S" + +// #define EIGHT_BIT_SAMPLES + +/* Constants for jsimd_fdct_islow_neon() */ + +#define F_0_298 2446 /* FIX(0.298631336) */ +#define F_0_390 3196 /* FIX(0.390180644) */ +#define F_0_541 4433 /* FIX(0.541196100) */ +#define F_0_765 6270 /* FIX(0.765366865) */ +#define F_0_899 7373 /* FIX(0.899976223) */ +#define F_1_175 9633 /* FIX(1.175875602) */ +#define F_1_501 12299 /* FIX(1.501321110) */ +#define F_1_847 15137 /* FIX(1.847759065) */ +#define F_1_961 16069 /* FIX(1.961570560) */ +#define F_2_053 16819 /* FIX(2.053119869) */ +#define F_2_562 20995 /* FIX(2.562915447) */ +#define F_3_072 25172 /* FIX(3.072711026) */ + +const jsimd_fdct_islow_neon_consts, align=4 + .short F_0_298 + .short -F_0_390 + .short F_0_541 + .short F_0_765 + .short - F_0_899 + .short F_1_175 + .short F_1_501 + .short - F_1_847 + .short - F_1_961 + .short F_2_053 + .short - F_2_562 + .short F_3_072 + .short 0 /* padding */ + .short 0 + .short 0 + .short 0 +endconst + +#undef F_0_298 +#undef F_0_390 +#undef F_0_541 +#undef F_0_765 +#undef F_0_899 +#undef F_1_175 +#undef F_1_501 +#undef F_1_847 +#undef F_1_961 +#undef F_2_053 +#undef F_2_562 +#undef F_3_072 + +/*****************************************************************************/ + +/* + * jsimd_fdct_islow_neon + * + * This file contains a slower but more accurate integer implementation of the + * forward DCT (Discrete Cosine Transform). The following code is based + * directly on the IJG''s original jfdctint.c; see the jfdctint.c for + * more details. + */ + +#define CONST_BITS 13 +#ifdef EIGHT_BIT_SAMPLES +#define PASS1_BITS 2 +#else +#define PASS1_BITS 1 /* lose a little precision to avoid overflow */ +#endif + +#define DESCALE_P1 (CONST_BITS - PASS1_BITS) +#define DESCALE_P2 (CONST_BITS + PASS1_BITS) + +#define XFIX_P_0_298 v0.h[0] +#define XFIX_N_0_390 v0.h[1] +#define XFIX_P_0_541 v0.h[2] +#define XFIX_P_0_765 v0.h[3] +#define XFIX_N_0_899 v0.h[4] +#define XFIX_P_1_175 v0.h[5] +#define XFIX_P_1_501 v0.h[6] +#define XFIX_N_1_847 v0.h[7] +#define XFIX_N_1_961 v1.h[0] +#define XFIX_P_2_053 v1.h[1] +#define XFIX_N_2_562 v1.h[2] +#define XFIX_P_3_072 v1.h[3] + +function ff_fdct_neon, export=1 + + DATA .req x0 + TMP .req x9 + + /* Load constants */ + movrel TMP, jsimd_fdct_islow_neon_consts + ld1 {v0.8h, v1.8h}, [TMP] + + /* Load all DATA into Neon registers with the following allocation: + * 0 1 2 3 | 4 5 6 7 + * ---------+-------- + * 0 | d16 | d17 | v16.8h + * 1 | d18 | d19 | v17.8h + * 2 | d20 | d21 | v18.8h + * 3 | d22 | d23 | v19.8h + * 4 | d24 | d25 | v20.8h + * 5 | d26 | d27 | v21.8h + * 6 | d28 | d29 | v22.8h + * 7 | d30 | d31 | v23.8h + */ + + ld1 {v16.8h, v17.8h, v18.8h, v19.8h}, [DATA], 64 + ld1 {v20.8h, v21.8h, v22.8h, v23.8h}, [DATA] + sub DATA, DATA, #64 + + /* Transpose */ + transpose_8x8H v16, v17, v18, v19, v20, v21, v22, v23, v31, v2 + + /* 1-D FDCT */ + add v24.8h, v16.8h, v23.8h /* tmp0 = dataptr[0] + dataptr[7]; */ + sub v31.8h, v16.8h, v23.8h /* tmp7 = dataptr[0] - dataptr[7]; */ + add v25.8h, v17.8h, v22.8h /* tmp1 = dataptr[1] + dataptr[6]; */ + sub v30.8h, v17.8h, v22.8h /* tmp6 = dataptr[1] - dataptr[6]; */ + add v26.8h, v18.8h, v21.8h /* tmp2 = dataptr[2] + dataptr[5]; */ + sub v29.8h, v18.8h, v21.8h /* tmp5 = dataptr[2] - dataptr[5]; */ + add v27.8h, v19.8h, v20.8h /* tmp3 = dataptr[3] + dataptr[4]; */ + sub v28.8h, v19.8h, v20.8h /* tmp4 = dataptr[3] - dataptr[4]; */ + + /* Even part */ + add v4.8h, v24.8h, v27.8h /* tmp10 = tmp0 + tmp3; */ + sub v5.8h, v24.8h, v27.8h /* tmp13 = tmp0 - tmp3; */ + add v6.8h, v25.8h, v26.8h /* tmp11 = tmp1 + tmp2; */ + sub v7.8h, v25.8h, v26.8h /* tmp12 = tmp1 - tmp2; */ + + add v16.8h, v4.8h, v6.8h /* tmp10 + tmp11 */ + sub v20.8h, v4.8h, v6.8h /* tmp10 - tmp11 */ + + add v18.8h, v7.8h, v5.8h /* tmp12 + tmp13 */ + + shl v16.8h, v16.8h, #PASS1_BITS /* dataptr[0] = (DCTELEM)LEFT_SHIFT(tmp10 + tmp11, PASS1_BITS); */ + shl v20.8h, v20.8h, #PASS1_BITS /* dataptr[4] = (DCTELEM)LEFT_SHIFT(tmp10 - tmp11, PASS1_BITS); */ + + smull2 v24.4s, v18.8h, XFIX_P_0_541 /* z1 hi = MULTIPLY(tmp12 + tmp13, XFIX_P_0_541); */ + smull v18.4s, v18.4h, XFIX_P_0_541 /* z1 lo = MULTIPLY(tmp12 + tmp13, XFIX_P_0_541); */ + mov v22.16b, v18.16b + mov v25.16b, v24.16b + + smlal v18.4s, v5.4h, XFIX_P_0_765 /* lo z1 + MULTIPLY(tmp13, XFIX_P_0_765) */ + smlal2 v24.4s, v5.8h, XFIX_P_0_765 /* hi z1 + MULTIPLY(tmp13, XFIX_P_0_765) */ + smlal v22.4s, v7.4h, XFIX_N_1_847 /* lo z1 + MULTIPLY(tmp12, XFIX_N_1_847) */ + smlal2 v25.4s, v7.8h, XFIX_N_1_847 /* hi z1 + MULTIPLY(tmp12, XFIX_N_1_847) */ + + rshrn v18.4h, v18.4s, #DESCALE_P1 + rshrn v22.4h, v22.4s, #DESCALE_P1 + rshrn2 v18.8h, v24.4s, #DESCALE_P1 /* dataptr[2] = (DCTELEM)DESCALE(z1 + MULTIPLY(tmp13, XFIX_P_0_765), CONST_BITS-PASS1_BITS); */ + rshrn2 v22.8h, v25.4s, #DESCALE_P1 /* dataptr[6] = (DCTELEM)DESCALE(z1 + MULTIPLY(tmp12, XFIX_N_1_847), CONST_BITS-PASS1_BITS); */ + + /* Odd part */ + add v2.8h, v28.8h, v31.8h /* z1 = tmp4 + tmp7; */ + add v3.8h, v29.8h, v30.8h /* z2 = tmp5 + tmp6; */ + add v6.8h, v28.8h, v30.8h /* z3 = tmp4 + tmp6; */ + add v7.8h, v29.8h, v31.8h /* z4 = tmp5 + tmp7; */ + smull v4.4s, v6.4h, XFIX_P_1_175 /* z5 lo = z3 lo * XFIX_P_1_175 */ + smull2 v5.4s, v6.8h, XFIX_P_1_175 + smlal v4.4s, v7.4h, XFIX_P_1_175 /* z5 = MULTIPLY(z3 + z4, FIX_1_175875602); */ + smlal2 v5.4s, v7.8h, XFIX_P_1_175 + + smull2 v24.4s, v28.8h, XFIX_P_0_298 + smull2 v25.4s, v29.8h, XFIX_P_2_053 + smull2 v26.4s, v30.8h, XFIX_P_3_072 + smull2 v27.4s, v31.8h, XFIX_P_1_501 + smull v23.4s, v28.4h, XFIX_P_0_298 /* tmp4 = MULTIPLY(tmp4, FIX_0_298631336); */ + smull v21.4s, v29.4h, XFIX_P_2_053 /* tmp5 = MULTIPLY(tmp5, FIX_2_053119869); */ + smull v19.4s, v30.4h, XFIX_P_3_072 /* tmp6 = MULTIPLY(tmp6, FIX_3_072711026); */ + smull v17.4s, v31.4h, XFIX_P_1_501 /* tmp7 = MULTIPLY(tmp7, FIX_1_501321110); */ + + smull2 v28.4s, v2.8h, XFIX_N_0_899 + smull2 v29.4s, v3.8h, XFIX_N_2_562 + smull2 v30.4s, v6.8h, XFIX_N_1_961 + smull2 v31.4s, v7.8h, XFIX_N_0_390 + smull v2.4s, v2.4h, XFIX_N_0_899 /* z1 = MULTIPLY(z1, -FIX_0_899976223); */ + smull v3.4s, v3.4h, XFIX_N_2_562 /* z2 = MULTIPLY(z2, -FIX_2_562915447); */ + smull v6.4s, v6.4h, XFIX_N_1_961 /* z3 = MULTIPLY(z3, -FIX_1_961570560); */ + smull v7.4s, v7.4h, XFIX_N_0_390 /* z4 = MULTIPLY(z4, -FIX_0_390180644); */ + + add v6.4s, v6.4s, v4.4s /* z3 += z5 */ + add v30.4s, v30.4s, v5.4s + add v7.4s, v7.4s, v4.4s /* z4 += z5 */ + add v31.4s, v31.4s, v5.4s + + add v23.4s, v23.4s, v2.4s /* tmp4 += z1 */ + add v24.4s, v24.4s, v28.4s + add v21.4s, v21.4s, v3.4s /* tmp5 += z2 */ + add v25.4s, v25.4s, v29.4s + add v19.4s, v19.4s, v6.4s /* tmp6 += z3 */ + add v26.4s, v26.4s, v30.4s + add v17.4s, v17.4s, v7.4s /* tmp7 += z4 */ + add v27.4s, v27.4s, v31.4s + + add v23.4s, v23.4s, v6.4s /* tmp4 += z3 */ + add v24.4s, v24.4s, v30.4s + add v21.4s, v21.4s, v7.4s /* tmp5 += z4 */ + add v25.4s, v25.4s, v31.4s + add v19.4s, v19.4s, v3.4s /* tmp6 += z2 */ + add v26.4s, v26.4s, v29.4s + add v17.4s, v17.4s, v2.4s /* tmp7 += z1 */ + add v27.4s, v27.4s, v28.4s + + rshrn v23.4h, v23.4s, #DESCALE_P1 + rshrn v21.4h, v21.4s, #DESCALE_P1 + rshrn v19.4h, v19.4s, #DESCALE_P1 + rshrn v17.4h, v17.4s, #DESCALE_P1 + rshrn2 v23.8h, v24.4s, #DESCALE_P1 /* dataptr[7] = (DCTELEM)DESCALE(tmp4 + z1 + z3, CONST_BITS-PASS1_BITS); */ + rshrn2 v21.8h, v25.4s, #DESCALE_P1 /* dataptr[5] = (DCTELEM)DESCALE(tmp5 + z2 + z4, CONST_BITS-PASS1_BITS); */ + rshrn2 v19.8h, v26.4s, #DESCALE_P1 /* dataptr[3] = (DCTELEM)DESCALE(tmp6 + z2 + z3, CONST_BITS-PASS1_BITS); */ + rshrn2 v17.8h, v27.4s, #DESCALE_P1 /* dataptr[1] = (DCTELEM)DESCALE(tmp7 + z1 + z4, CONST_BITS-PASS1_BITS); */ + + /* Transpose */ + transpose_8x8H v16, v17, v18, v19, v20, v21, v22, v23, v31, v2 + + /* 1-D FDCT */ + add v24.8h, v16.8h, v23.8h /* tmp0 = dataptr[0] + dataptr[7]; */ + sub v31.8h, v16.8h, v23.8h /* tmp7 = dataptr[0] - dataptr[7]; */ + add v25.8h, v17.8h, v22.8h /* tmp1 = dataptr[1] + dataptr[6]; */ + sub v30.8h, v17.8h, v22.8h /* tmp6 = dataptr[1] - dataptr[6]; */ + add v26.8h, v18.8h, v21.8h /* tmp2 = dataptr[2] + dataptr[5]; */ + sub v29.8h, v18.8h, v21.8h /* tmp5 = dataptr[2] - dataptr[5]; */ + add v27.8h, v19.8h, v20.8h /* tmp3 = dataptr[3] + dataptr[4]; */ + sub v28.8h, v19.8h, v20.8h /* tmp4 = dataptr[3] - dataptr[4]; */ + + /* Even part */ + add v4.8h, v24.8h, v27.8h /* tmp10 = tmp0 + tmp3; */ + sub v5.8h, v24.8h, v27.8h /* tmp13 = tmp0 - tmp3; */ + add v6.8h, v25.8h, v26.8h /* tmp11 = tmp1 + tmp2; */ + sub v7.8h, v25.8h, v26.8h /* tmp12 = tmp1 - tmp2; */ + + add v16.8h, v4.8h, v6.8h /* tmp10 + tmp11 */ + sub v20.8h, v4.8h, v6.8h /* tmp10 - tmp11 */ + + add v18.8h, v7.8h, v5.8h /* tmp12 + tmp13 */ + + srshr v16.8h, v16.8h, #PASS1_BITS /* dataptr[0] = (DCTELEM)DESCALE(tmp10 + tmp11, PASS1_BITS); */ + srshr v20.8h, v20.8h, #PASS1_BITS /* dataptr[4] = (DCTELEM)DESCALE(tmp10 - tmp11, PASS1_BITS); */ + + smull2 v24.4s, v18.8h, XFIX_P_0_541 /* z1 hi = MULTIPLY(tmp12 + tmp13, XFIX_P_0_541); */ + smull v18.4s, v18.4h, XFIX_P_0_541 /* z1 lo = MULTIPLY(tmp12 + tmp13, XFIX_P_0_541); */ + mov v22.16b, v18.16b + mov v25.16b, v24.16b + + smlal v18.4s, v5.4h, XFIX_P_0_765 /* lo z1 + MULTIPLY(tmp13, XFIX_P_0_765) */ + smlal2 v24.4s, v5.8h, XFIX_P_0_765 /* hi z1 + MULTIPLY(tmp13, XFIX_P_0_765) */ + smlal v22.4s, v7.4h, XFIX_N_1_847 /* lo z1 + MULTIPLY(tmp12, XFIX_N_1_847) */ + smlal2 v25.4s, v7.8h, XFIX_N_1_847 /* hi z1 + MULTIPLY(tmp12, XFIX_N_1_847) */ + + rshrn v18.4h, v18.4s, #DESCALE_P2 + rshrn v22.4h, v22.4s, #DESCALE_P2 + rshrn2 v18.8h, v24.4s, #DESCALE_P2 /* dataptr[2] = (DCTELEM)DESCALE(z1 + MULTIPLY(tmp13, XFIX_P_0_765), CONST_BITS+PASS1_BITS); */ + rshrn2 v22.8h, v25.4s, #DESCALE_P2 /* dataptr[6] = (DCTELEM)DESCALE(z1 + MULTIPLY(tmp12, XFIX_N_1_847), CONST_BITS+PASS1_BITS); */ + + /* Odd part */ + add v2.8h, v28.8h, v31.8h /* z1 = tmp4 + tmp7; */ + add v3.8h, v29.8h, v30.8h /* z2 = tmp5 + tmp6; */ + add v6.8h, v28.8h, v30.8h /* z3 = tmp4 + tmp6; */ + add v7.8h, v29.8h, v31.8h /* z4 = tmp5 + tmp7; */ + + smull v4.4s, v6.4h, XFIX_P_1_175 /* z5 lo = z3 lo * XFIX_P_1_175 */ + smull2 v5.4s, v6.8h, XFIX_P_1_175 + smlal v4.4s, v7.4h, XFIX_P_1_175 /* z5 = MULTIPLY(z3 + z4, FIX_1_175875602); */ + smlal2 v5.4s, v7.8h, XFIX_P_1_175 + + smull2 v24.4s, v28.8h, XFIX_P_0_298 + smull2 v25.4s, v29.8h, XFIX_P_2_053 + smull2 v26.4s, v30.8h, XFIX_P_3_072 + smull2 v27.4s, v31.8h, XFIX_P_1_501 + smull v23.4s, v28.4h, XFIX_P_0_298 /* tmp4 = MULTIPLY(tmp4, FIX_0_298631336); */ + smull v21.4s, v29.4h, XFIX_P_2_053 /* tmp5 = MULTIPLY(tmp5, FIX_2_053119869); */ + smull v19.4s, v30.4h, XFIX_P_3_072 /* tmp6 = MULTIPLY(tmp6, FIX_3_072711026); */ + smull v17.4s, v31.4h, XFIX_P_1_501 /* tmp7 = MULTIPLY(tmp7, FIX_1_501321110); */ + + smull2 v28.4s, v2.8h, XFIX_N_0_899 + smull2 v29.4s, v3.8h, XFIX_N_2_562 + smull2 v30.4s, v6.8h, XFIX_N_1_961 + smull2 v31.4s, v7.8h, XFIX_N_0_390 + smull v2.4s, v2.4h, XFIX_N_0_899 /* z1 = MULTIPLY(z1, -FIX_0_899976223); */ + smull v3.4s, v3.4h, XFIX_N_2_562 /* z2 = MULTIPLY(z2, -FIX_2_562915447); */ + smull v6.4s, v6.4h, XFIX_N_1_961 /* z3 = MULTIPLY(z3, -FIX_1_961570560); */ + smull v7.4s, v7.4h, XFIX_N_0_390 /* z4 = MULTIPLY(z4, -FIX_0_390180644); */ + + add v6.4s, v6.4s, v4.4s /* z3 += z5 */ + add v30.4s, v30.4s, v5.4s + add v7.4s, v7.4s, v4.4s /* z4 += z5 */ + add v31.4s, v31.4s, v5.4s + + add v23.4s, v23.4s, v2.4s /* tmp4 += z1 */ + add v24.4s, v24.4s, v28.4s + add v21.4s, v21.4s, v3.4s /* tmp5 += z2 */ + add v25.4s, v25.4s, v29.4s + add v19.4s, v19.4s, v6.4s /* tmp6 += z3 */ + add v26.4s, v26.4s, v30.4s + add v17.4s, v17.4s, v7.4s /* tmp7 += z4 */ + add v27.4s, v27.4s, v31.4s + + add v23.4s, v23.4s, v6.4s /* tmp4 += z3 */ + add v24.4s, v24.4s, v30.4s + add v21.4s, v21.4s, v7.4s /* tmp5 += z4 */ + add v25.4s, v25.4s, v31.4s + add v19.4s, v19.4s, v3.4s /* tmp6 += z2 */ + add v26.4s, v26.4s, v29.4s + add v17.4s, v17.4s, v2.4s /* tmp7 += z1 */ + add v27.4s, v27.4s, v28.4s + + rshrn v23.4h, v23.4s, #DESCALE_P2 + rshrn v21.4h, v21.4s, #DESCALE_P2 + rshrn v19.4h, v19.4s, #DESCALE_P2 + rshrn v17.4h, v17.4s, #DESCALE_P2 + rshrn2 v23.8h, v24.4s, #DESCALE_P2 /* dataptr[7] = (DCTELEM)DESCALE(tmp4 + z1 + z3, CONST_BITS+PASS1_BITS); */ + rshrn2 v21.8h, v25.4s, #DESCALE_P2 /* dataptr[5] = (DCTELEM)DESCALE(tmp5 + z2 + z4, CONST_BITS+PASS1_BITS); */ + rshrn2 v19.8h, v26.4s, #DESCALE_P2 /* dataptr[3] = (DCTELEM)DESCALE(tmp6 + z2 + z3, CONST_BITS+PASS1_BITS); */ + rshrn2 v17.8h, v27.4s, #DESCALE_P2 /* dataptr[1] = (DCTELEM)DESCALE(tmp7 + z1 + z4, CONST_BITS+PASS1_BITS); */ + + /* Store results */ + st1 {v16.8h, v17.8h, v18.8h, v19.8h}, [DATA], 64 + st1 {v20.8h, v21.8h, v22.8h, v23.8h}, [DATA] + + ret + + .unreq DATA + .unreq TMP +endfunc + +#undef XFIX_P_0_298 +#undef XFIX_N_0_390 +#undef XFIX_P_0_541 +#undef XFIX_P_0_765 +#undef XFIX_N_0_899 +#undef XFIX_P_1_175 +#undef XFIX_P_1_501 +#undef XFIX_N_1_847 +#undef XFIX_N_1_961 +#undef XFIX_P_2_053 +#undef XFIX_N_2_562 +#undef XFIX_P_3_072 diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index 968009a192..2da63c87ea 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -1538,6 +1538,7 @@ typedef struct AVCodecContext { #define FF_DCT_MMX 3 #define FF_DCT_ALTIVEC 5 #define FF_DCT_FAAN 6 +#define FF_DCT_NEON 7 /** * IDCT algorithm, see FF_IDCT_* below. diff --git a/libavcodec/fdctdsp.c b/libavcodec/fdctdsp.c index f8ba17426c..d20558ce88 100644 --- a/libavcodec/fdctdsp.c +++ b/libavcodec/fdctdsp.c @@ -42,7 +42,9 @@ av_cold void ff_fdctdsp_init(FDCTDSPContext *c, AVCodecContext *avctx) c->fdct248 = ff_fdct248_islow_8; } -#if ARCH_PPC +#if ARCH_AARCH64 + ff_fdctdsp_init_aarch64(c, avctx, high_bit_depth); +#elif ARCH_PPC ff_fdctdsp_init_ppc(c, avctx, high_bit_depth); #elif ARCH_X86 ff_fdctdsp_init_x86(c, avctx, high_bit_depth); diff --git a/libavcodec/fdctdsp.h b/libavcodec/fdctdsp.h index 7378eab870..cad99ed7ca 100644 --- a/libavcodec/fdctdsp.h +++ b/libavcodec/fdctdsp.h @@ -32,6 +32,8 @@ typedef struct FDCTDSPContext { FF_VISIBILITY_PUSH_HIDDEN void ff_fdctdsp_init(FDCTDSPContext *c, struct AVCodecContext *avctx); +void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, struct AVCodecContext *avctx, + unsigned high_bit_depth); void ff_fdctdsp_init_ppc(FDCTDSPContext *c, struct AVCodecContext *avctx, unsigned high_bit_depth); void ff_fdctdsp_init_x86(FDCTDSPContext *c, struct AVCodecContext *avctx, diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h index 7a70fa7b6c..33f1bce887 100644 --- a/libavcodec/options_table.h +++ b/libavcodec/options_table.h @@ -158,6 +158,7 @@ static const AVOption avcodec_options[] = { {"mmx", NULL, 0, AV_OPT_TYPE_CONST, {.i64 = FF_DCT_MMX }, INT_MIN, INT_MAX, V|E, .unit = "dct"}, {"altivec", NULL, 0, AV_OPT_TYPE_CONST, {.i64 = FF_DCT_ALTIVEC }, INT_MIN, INT_MAX, V|E, .unit = "dct"}, {"faan", "floating point AAN DCT", 0, AV_OPT_TYPE_CONST, {.i64 = FF_DCT_FAAN }, INT_MIN, INT_MAX, V|E, .unit = "dct"}, +{"neon", NULL, 0, AV_OPT_TYPE_CONST, {.i64 = FF_DCT_NEON }, INT_MIN, INT_MAX, V|E, .unit = "dct"}, {"lumi_mask", "compresses bright areas stronger than medium ones", OFFSET(lumi_masking), AV_OPT_TYPE_FLOAT, {.dbl = 0 }, -FLT_MAX, FLT_MAX, V|E}, {"tcplx_mask", "temporal complexity masking", OFFSET(temporal_cplx_masking), AV_OPT_TYPE_FLOAT, {.dbl = 0 }, -FLT_MAX, FLT_MAX, V|E}, {"scplx_mask", "spatial complexity masking", OFFSET(spatial_cplx_masking), AV_OPT_TYPE_FLOAT, {.dbl = 0 }, -FLT_MAX, FLT_MAX, V|E}, diff --git a/libavcodec/tests/aarch64/dct.c b/libavcodec/tests/aarch64/dct.c index 9e477328d5..e98a887cd5 100644 --- a/libavcodec/tests/aarch64/dct.c +++ b/libavcodec/tests/aarch64/dct.c @@ -19,9 +19,11 @@ #include "config.h" #include "libavutil/cpu.h" +#include "libavcodec/aarch64/fdct.h" #include "libavcodec/aarch64/idct.h" static const struct algo fdct_tab_arch[] = { + { "neon", ff_fdct_neon, FF_IDCT_PERM_NONE, AV_CPU_FLAG_NEON }, { 0 } };