From patchwork Thu Jun 23 18:04:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 36405 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1a22:b0:84:42e0:ad30 with SMTP id cj34csp3624344pzb; Thu, 23 Jun 2022 11:04:23 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u9RESDf5ylt+gDeicNOcgPmiUO0KXZ1jNXk7k8HwFa5R8I9jDVACCoIEZ2W0MX+MQAdJ2O X-Received: by 2002:a05:6402:4306:b0:435:a1c9:4272 with SMTP id m6-20020a056402430600b00435a1c94272mr12265903edc.205.1656007463285; Thu, 23 Jun 2022 11:04:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656007463; cv=none; d=google.com; s=arc-20160816; b=h/IDR6RuJMtPQ54rEZcDkGsXKARnNqZqoTMOWNXHwy5oG1WrxnHPdtQ1OXqihq3j5F 5vHrLTyCunmwCuXS7OW/G1nxH6ugJ1q2egAoxKHrYyE7e0uQ9V42/Zr4LJHlQ0/joLLW sDpVdG0AupfHFY6CFaSwjb5ZMUYCRE7+RmB4Aa/vJBL+ctLEPFtnhskWN9OIfT8QrhvG WJOGlGT9FWZ17UKjudXANjwyH5Gb8OdZWPjFlgt3qkZ4vMB1iL5FLSDdPF31OxR4heEH bOSC0yYzK9nyO2eNoxBe94wkp/qjcW6JgHNceI9qTZD5XQwe4dvtxhOoTsxjaWYX4uMn +78A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=RatKKsZdZQka6Xld5KbeSj9ruvDcCCSiasU4rp5qzYc=; b=z4IoCxybLCF7p70lU8Vkhkg5S6p22mL7dN3BQAKSzhvlankkAn8yb5R1ePoC08U2nP 9bUj/Ilo/rO4JZ14NnCkyJ8q+dPi4qA54+jpV98u4+fr9krfcDAg4OmK8wCO8fH88ll7 M9KfMT83zC6Dj/AwF/9Pp3qMYRpPiys9yFOQxeuzMp3LSsk3PsXf7afPAko5klBigVxn eEzanV+vwYSQmiPBfQ74rYvN7+nuUjPrQvNFSw9WkQTC91kpVQ4pGwY3KrHsYTk9ikrA gorixoTBk/cm/D6SUFeqtGZ40j1waERcaL5LUEtwZs09KIE4crVbjpORZYRq/xlsk1d2 Ivbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=v729Vycn; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=HwiZG8qA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hz5-20020a1709072ce500b00707e0e3cd7csi6015602ejc.359.2022.06.23.11.04.22; Thu, 23 Jun 2022 11:04:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=v729Vycn; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=HwiZG8qA; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0287668B6EA; Thu, 23 Jun 2022 21:04:19 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 699FC68B201 for ; Thu, 23 Jun 2022 21:04:12 +0300 (EEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id ED1F55C013B for ; Thu, 23 Jun 2022 14:04:10 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 23 Jun 2022 14:04:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:sender:subject:subject:to:to; s=fm3; t=1656007450; x=1656093850; bh=9NZXSsFhl1QS3+6OHObcx+X8w cn5o8mcvHLU+B7Sv+g=; b=v729VycnDA03iB4IOd2ZtUB4IgxanJDo1pNLgqtDR eN3pbZYnveKS4gINXIwksrb30PrH42aGnp4CMdSgIpzejii7aZZBqoIWlS4Dl8Cg yPL9aoEgDQNk65fivKGyToKLQrO003odVGy6AD0osiyMF1uhGXjys0OyEzv7yfsw gCANT0D/FkZ1nyviiJazauuHz3Cyu1sAnAokNSnt1+0oKGzPclprPJKW3VAQ9XxC qUAb23gHdhqNSeRD+SyMd28DSon7NpN7gY1U71mrdGQRPGuVXs8CjQiAktbvlacr P9q7pRnhjRKEvkoK6M5vvguqua4eT6cXMHmPHJg8XaeRg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1656007450; x=1656093850; bh=9NZXSsFhl1QS3+6OHObcx+X8wcn5o8mcvHL U+B7Sv+g=; b=HwiZG8qAhmF7rqFgdvqKCUYSdsCW1b/t1mOALxFSrfNN+VYMqJ2 ZHPfH/iXg+EHCGpcMivXv3ilpAcgK0zEptxPLqJYx/LyRaE11U9jD/GgpKut3ePn bVq4/KYYgYeulJQH4dzNpWZeRGMTyj+WQkCs+y/zt6Lo82UYSzlNjadvTEf9pN4+ wlRA4Sn32/9CQlJ69eUqdvyXhDfcY9jblSGT2tT4x3yZmeO1a39QiV0Hiaz2Ubog lRAh3Y+pTdvVMrPdmtf3FCHeHoRoig1FboIpANmaM2E151kIuHSDp8A1YJ6qUijR WvbS3D1aZhbUiROjxKbuXSOTjgwPnLruCZA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudefjedguddvvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffoggfgsedtkeertd ertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhimhhu lhdrlhhiqeenucggtffrrghtthgvrhhnpeeutefgtdeuvdejjeejvdetleffueehtdefte dugfelhfejueektedvvddtveeiieenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehjuggvkhesihhtrghnihhmuhhlrdhlih X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 23 Jun 2022 14:04:10 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 Jun 2022 20:04:05 +0200 Message-Id: <20220623180407.21081-1-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/3] checkasm/hevc_add_res: add 12bit test X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: JW8P4LyHBELJ Signed-off-by: J. Dekker --- tests/checkasm/hevc_add_res.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/tests/checkasm/hevc_add_res.c b/tests/checkasm/hevc_add_res.c index 0c896adaca..f17d121939 100644 --- a/tests/checkasm/hevc_add_res.c +++ b/tests/checkasm/hevc_add_res.c @@ -36,14 +36,14 @@ } \ } while (0) -#define randomize_buffers2(buf, size) \ +#define randomize_buffers2(buf, size, mask) \ do { \ int j; \ for (j = 0; j < size; j++) \ - AV_WN16A(buf + j * 2, rnd() & 0x3FF); \ + AV_WN16A(buf + j * 2, rnd() & mask); \ } while (0) -static void compare_add_res(int size, ptrdiff_t stride, int overflow_test) +static void compare_add_res(int size, ptrdiff_t stride, int overflow_test, int mask) { LOCAL_ALIGNED_32(int16_t, res0, [32 * 32]); LOCAL_ALIGNED_32(int16_t, res1, [32 * 32]); @@ -53,7 +53,7 @@ static void compare_add_res(int size, ptrdiff_t stride, int overflow_test) declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, int16_t *res, ptrdiff_t stride); randomize_buffers(res0, size); - randomize_buffers2(dst0, size); + randomize_buffers2(dst0, size, mask); if (overflow_test) res0[0] = 0x8000; memcpy(res1, res0, sizeof(*res0) * size); @@ -69,6 +69,7 @@ static void compare_add_res(int size, ptrdiff_t stride, int overflow_test) static void check_add_res(HEVCDSPContext h, int bit_depth) { int i; + int mask = bit_depth == 8 ? 0xFFFF : bit_depth == 10 ? 0x03FF : 0x07FF; for (i = 2; i <= 5; i++) { int block_size = 1 << i; @@ -76,9 +77,9 @@ static void check_add_res(HEVCDSPContext h, int bit_depth) ptrdiff_t stride = block_size << (bit_depth > 8); if (check_func(h.add_residual[i - 2], "hevc_add_res_%dx%d_%d", block_size, block_size, bit_depth)) { - compare_add_res(size, stride, 0); + compare_add_res(size, stride, 0, mask); // overflow test for res = -32768 - compare_add_res(size, stride, 1); + compare_add_res(size, stride, 1, mask); } } } @@ -87,7 +88,7 @@ void checkasm_check_hevc_add_res(void) { int bit_depth; - for (bit_depth = 8; bit_depth <= 10; bit_depth++) { + for (bit_depth = 8; bit_depth <= 12; bit_depth++) { HEVCDSPContext h; ff_hevc_dsp_init(&h, bit_depth); From patchwork Thu Jun 23 18:04:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 36406 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1a22:b0:84:42e0:ad30 with SMTP id cj34csp3624452pzb; Thu, 23 Jun 2022 11:04:34 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tj7eCGbO86lUtxnf5MB1zpQRmux7BrwSuMfes7B3ffliSd86vDZucy8MUkhyhNJNggizE0 X-Received: by 2002:a05:6402:42d5:b0:433:1727:b31c with SMTP id i21-20020a05640242d500b004331727b31cmr12262778edc.9.1656007474443; Thu, 23 Jun 2022 11:04:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656007474; cv=none; d=google.com; s=arc-20160816; b=cw+S680zlSVdqAAMQLatAAoLNgrToxGo6FvNDZZlCqMBDSrzxjiUR8QsZ+u3NSucPF cGOrXvfu+pTVXULCEuAjv0m4qirj1O9y9XewzesQ3eu+0zMYfHSOQMl+YH8+HOUzxbHT yKinYcnK9bsZ2S6WaO2bfDL3b1mzwRF0s9m3YpGlmIJPBeOxLgigklLEu2Ks+jo4LJny vyY9Yx8+Sy19tNwAxpjoObn4wJMjj2auTHDwNL8CErVF1flNnX0efJrvs1m2lNZs2viU pfTsXU3VnZPGLmJxyimFrKJz4abwEGYrcRcVaQkyS59w47OnHqZNdSjzT0EpbRWfu/Hw 5nuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:feedback-id:dkim-signature:dkim-signature:delivered-to; bh=sr3i+csjg35nT5gyevChW56XpFcoR8aTCAfG2OIxdGk=; b=bbGXynhLrcQXSld0+1DlAJBCiz0QUYfkBWcw7DTToFSXpdsR68paR2IsvZPxDLqKOD NEdqwDCHGSKMkbn2m7/rZQSH0qtHm2DO7oDromCqWasMX0ac6Lha+nUssRm8On4kZ6Zi Ef9c0p4IszqArs1H26aVBg5JNesDZOrSvPe68E1O1wrI70vRJEIzYjoQ59v77M5/rPbA 2UJZAi+lWFJmuQ80FiMvV3b67koVc3+gKpK9c6jZWc2tG1/L4aLt8h1u0bX/n3qxLk08 oAZ4eyuRBnNzJ5800Y0LgFtxBmv11+11LVgxBm/fqP92S1/wkNAu7vMeZl4JLqkD3Vgi nb3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=fpzS+UrE; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=esXhXLqd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hu9-20020a170907a08900b00723fe13e988si3260826ejc.244.2022.06.23.11.04.33; Thu, 23 Jun 2022 11:04:34 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=fpzS+UrE; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=esXhXLqd; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0903668B73B; Thu, 23 Jun 2022 21:04:21 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DFF1C68B312 for ; Thu, 23 Jun 2022 21:04:12 +0300 (EEST) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 155E75C01C2 for ; Thu, 23 Jun 2022 14:04:12 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Thu, 23 Jun 2022 14:04:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1656007452; x=1656093852; bh=Hy 04QW23VNi6KCvC60Wt9zvBFztHaK4p1iN7NScgHuI=; b=fpzS+UrEso1ZtQRfNQ F0V9C9D1Z4JADVceZWYIqsoGpnWlx+eaBM8UyNEBiDGXXzWULklU6KAOjcWkzp1N VDUrhRGeezDwK7mcMPV2N+H2HJvSIKWbqCEIpxRFGHsPgjsBKsDujoKCKs2aIP2v 0jPn24DeCcWObT1xiLmDE3Ch1YsGYChY9nVcrcLxLFzTUmSSCQ+qz5NZi0oljOi0 n6cbBfTXc+owEovvASJ1bTgxnJK2OHNDHw5A9lZOtJiQLyFa/7YEogodAUys5j8k dMM9AnxTAJdSPEhSUToV0hH156YzgSu0wa9hsWz23kVdCjR4oRCx9lyIlDh2y0jz PpSA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1656007452; x=1656093852; bh=Hy04QW23VNi6K CvC60Wt9zvBFztHaK4p1iN7NScgHuI=; b=esXhXLqdRz/pd3K4JxnC3/YGxpGif UgaILpJ/5CK76WFp4oJLCJBPUgKSak9KM151YthKYnVUZx/2/oFm67Y+O8vXEaBU Um+Qh0/Cp9SvQsIRbnpnVn8DhnQ+QrtasNQPaLx+8EfAVyjxJbx/mRJSNAA2dQmg aLRUeYVKf9xCPPbZ+zHEhHhiBhXGpth4Zbl0+GJiEyhxtF8NhZHk41p6osNM4R3E w1GcRgKvkXUI7GCF+7Bl71fZBXFVR28Q0TvDldcppLgsVRN858K5cGoKWEe8LaPd Bh9ECzbgNTHCfrK7P59XyLr3Q7p+SY5wHdO/DdoZnsQJI7odl4V1QOt6A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudefjedguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhi mhhulhdrlhhiqeenucggtffrrghtthgvrhhnpefhvdefjeffgefffeeifeevgfehuedule ehhfffvedttdfhheduiedtteefheeiteenucffohhmrghinhepnhgvohhnrdhssgenucev lhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesih htrghnihhmuhhlrdhlih X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 23 Jun 2022 14:04:11 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 Jun 2022 20:04:06 +0200 Message-Id: <20220623180407.21081-2-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220623180407.21081-1-jdek@itanimul.li> References: <20220623180407.21081-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/3] lavc/aarch64: reformat add_res funcs X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: cu7mWA8/21eu Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S | 216 ++++++++++++------------- 1 file changed, 108 insertions(+), 108 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 0869431294..484eea8437 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -27,21 +27,21 @@ #include "libavutil/aarch64/asm.S" const trans, align=4 - .short 64, 83, 64, 36 - .short 89, 75, 50, 18 - .short 90, 87, 80, 70 - .short 57, 43, 25, 9 - .short 90, 90, 88, 85 - .short 82, 78, 73, 67 - .short 61, 54, 46, 38 - .short 31, 22, 13, 4 + .short 64, 83, 64, 36 + .short 89, 75, 50, 18 + .short 90, 87, 80, 70 + .short 57, 43, 25, 9 + .short 90, 90, 88, 85 + .short 82, 78, 73, 67 + .short 61, 54, 46, 38 + .short 31, 22, 13, 4 endconst .macro clip10 in1, in2, c1, c2 - smax \in1, \in1, \c1 - smax \in2, \in2, \c1 - smin \in1, \in1, \c2 - smin \in2, \in2, \c2 + smax \in1, \in1, \c1 + smax \in2, \in2, \c1 + smin \in1, \in1, \c2 + smin \in2, \in2, \c2 .endm function ff_hevc_add_residual_4x4_8_neon, export=1 @@ -50,13 +50,13 @@ function ff_hevc_add_residual_4x4_8_neon, export=1 ld1 {v2.s}[1], [x0], x2 ld1 {v2.s}[2], [x0], x2 ld1 {v2.s}[3], [x0], x2 - sub x0, x0, x2, lsl #2 - uxtl v6.8h, v2.8b - uxtl2 v7.8h, v2.16b - sqadd v0.8h, v0.8h, v6.8h - sqadd v1.8h, v1.8h, v7.8h - sqxtun v0.8b, v0.8h - sqxtun2 v0.16b, v1.8h + sub x0, x0, x2, lsl #2 + uxtl v6.8h, v2.8b + uxtl2 v7.8h, v2.16b + sqadd v0.8h, v0.8h, v6.8h + sqadd v1.8h, v1.8h, v7.8h + sqxtun v0.8b, v0.8h + sqxtun2 v0.16b, v1.8h st1 {v0.s}[0], [x0], x2 st1 {v0.s}[1], [x0], x2 st1 {v0.s}[2], [x0], x2 @@ -70,63 +70,63 @@ function ff_hevc_add_residual_4x4_10_neon, export=1 ld1 {v2.d}[0], [x12], x2 ld1 {v2.d}[1], [x12], x2 ld1 {v3.d}[0], [x12], x2 - sqadd v0.8h, v0.8h, v2.8h + sqadd v0.8h, v0.8h, v2.8h ld1 {v3.d}[1], [x12], x2 - movi v4.8h, #0 - sqadd v1.8h, v1.8h, v3.8h - mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF - clip10 v0.8h, v1.8h, v4.8h, v5.8h - st1 {v0.d}[0], [x0], x2 - st1 {v0.d}[1], [x0], x2 - st1 {v1.d}[0], [x0], x2 - st1 {v1.d}[1], [x0], x2 + movi v4.8h, #0 + sqadd v1.8h, v1.8h, v3.8h + mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF + clip10 v0.8h, v1.8h, v4.8h, v5.8h + st1 {v0.d}[0], [x0], x2 + st1 {v0.d}[1], [x0], x2 + st1 {v1.d}[0], [x0], x2 + st1 {v1.d}[1], [x0], x2 ret endfunc function ff_hevc_add_residual_8x8_8_neon, export=1 - add x12, x0, x2 - add x2, x2, x2 - mov x3, #8 -1: subs x3, x3, #2 - ld1 {v2.d}[0], [x0] - ld1 {v2.d}[1], [x12] - uxtl v3.8h, v2.8b + add x12, x0, x2 + add x2, x2, x2 + mov x3, #8 +1: subs x3, x3, #2 + ld1 {v2.d}[0], [x0] + ld1 {v2.d}[1], [x12] + uxtl v3.8h, v2.8b ld1 {v0.8h-v1.8h}, [x1], #32 - uxtl2 v2.8h, v2.16b - sqadd v0.8h, v0.8h, v3.8h - sqadd v1.8h, v1.8h, v2.8h - sqxtun v0.8b, v0.8h - sqxtun2 v0.16b, v1.8h - st1 {v0.d}[0], [x0], x2 - st1 {v0.d}[1], [x12], x2 - bne 1b + uxtl2 v2.8h, v2.16b + sqadd v0.8h, v0.8h, v3.8h + sqadd v1.8h, v1.8h, v2.8h + sqxtun v0.8b, v0.8h + sqxtun2 v0.16b, v1.8h + st1 {v0.d}[0], [x0], x2 + st1 {v0.d}[1], [x12], x2 + bne 1b ret endfunc function ff_hevc_add_residual_8x8_10_neon, export=1 - add x12, x0, x2 - add x2, x2, x2 - mov x3, #8 - movi v4.8h, #0 - mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF -1: subs x3, x3, #2 + add x12, x0, x2 + add x2, x2, x2 + mov x3, #8 + movi v4.8h, #0 + mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF +1: subs x3, x3, #2 ld1 {v0.8h-v1.8h}, [x1], #32 - ld1 {v2.8h}, [x0] - sqadd v0.8h, v0.8h, v2.8h - ld1 {v3.8h}, [x12] - sqadd v1.8h, v1.8h, v3.8h - clip10 v0.8h, v1.8h, v4.8h, v5.8h - st1 {v0.8h}, [x0], x2 - st1 {v1.8h}, [x12], x2 - bne 1b + ld1 {v2.8h}, [x0] + sqadd v0.8h, v0.8h, v2.8h + ld1 {v3.8h}, [x12] + sqadd v1.8h, v1.8h, v3.8h + clip10 v0.8h, v1.8h, v4.8h, v5.8h + st1 {v0.8h}, [x0], x2 + st1 {v1.8h}, [x12], x2 + bne 1b ret endfunc function ff_hevc_add_residual_16x16_8_neon, export=1 - mov x3, #16 + mov x3, #16 add x12, x0, x2 - add x2, x2, x2 -1: subs x3, x3, #2 + add x2, x2, x2 +1: subs x3, x3, #2 ld1 {v16.16b}, [x0] ld1 {v0.8h-v3.8h}, [x1], #64 ld1 {v19.16b}, [x12] @@ -134,47 +134,47 @@ function ff_hevc_add_residual_16x16_8_neon, export=1 uxtl2 v18.8h, v16.16b uxtl v20.8h, v19.8b uxtl2 v21.8h, v19.16b - sqadd v0.8h, v0.8h, v17.8h - sqadd v1.8h, v1.8h, v18.8h - sqadd v2.8h, v2.8h, v20.8h - sqadd v3.8h, v3.8h, v21.8h - sqxtun v0.8b, v0.8h + sqadd v0.8h, v0.8h, v17.8h + sqadd v1.8h, v1.8h, v18.8h + sqadd v2.8h, v2.8h, v20.8h + sqadd v3.8h, v3.8h, v21.8h + sqxtun v0.8b, v0.8h sqxtun2 v0.16b, v1.8h - sqxtun v1.8b, v2.8h + sqxtun v1.8b, v2.8h sqxtun2 v1.16b, v3.8h st1 {v0.16b}, [x0], x2 st1 {v1.16b}, [x12], x2 - bne 1b + bne 1b ret endfunc function ff_hevc_add_residual_16x16_10_neon, export=1 - mov x3, #16 + mov x3, #16 movi v20.8h, #0 mvni v21.8h, #0xFC, lsl #8 // movi #0x3FF add x12, x0, x2 - add x2, x2, x2 -1: subs x3, x3, #2 + add x2, x2, x2 +1: subs x3, x3, #2 ld1 {v16.8h-v17.8h}, [x0] - ld1 {v0.8h-v3.8h}, [x1], #64 - sqadd v0.8h, v0.8h, v16.8h + ld1 {v0.8h-v3.8h}, [x1], #64 + sqadd v0.8h, v0.8h, v16.8h ld1 {v18.8h-v19.8h}, [x12] - sqadd v1.8h, v1.8h, v17.8h - sqadd v2.8h, v2.8h, v18.8h - sqadd v3.8h, v3.8h, v19.8h - clip10 v0.8h, v1.8h, v20.8h, v21.8h - clip10 v2.8h, v3.8h, v20.8h, v21.8h - st1 {v0.8h-v1.8h}, [x0], x2 - st1 {v2.8h-v3.8h}, [x12], x2 - bne 1b + sqadd v1.8h, v1.8h, v17.8h + sqadd v2.8h, v2.8h, v18.8h + sqadd v3.8h, v3.8h, v19.8h + clip10 v0.8h, v1.8h, v20.8h, v21.8h + clip10 v2.8h, v3.8h, v20.8h, v21.8h + st1 {v0.8h-v1.8h}, [x0], x2 + st1 {v2.8h-v3.8h}, [x12], x2 + bne 1b ret endfunc function ff_hevc_add_residual_32x32_8_neon, export=1 add x12, x0, x2 - add x2, x2, x2 - mov x3, #32 -1: subs x3, x3, #2 + add x2, x2, x2 + mov x3, #32 +1: subs x3, x3, #2 ld1 {v20.16b, v21.16b}, [x0] uxtl v16.8h, v20.8b uxtl2 v17.8h, v20.16b @@ -187,43 +187,43 @@ function ff_hevc_add_residual_32x32_8_neon, export=1 uxtl2 v21.8h, v22.16b uxtl v22.8h, v23.8b uxtl2 v23.8h, v23.16b - sqadd v0.8h, v0.8h, v16.8h - sqadd v1.8h, v1.8h, v17.8h - sqadd v2.8h, v2.8h, v18.8h - sqadd v3.8h, v3.8h, v19.8h - sqadd v4.8h, v4.8h, v20.8h - sqadd v5.8h, v5.8h, v21.8h - sqadd v6.8h, v6.8h, v22.8h - sqadd v7.8h, v7.8h, v23.8h - sqxtun v0.8b, v0.8h + sqadd v0.8h, v0.8h, v16.8h + sqadd v1.8h, v1.8h, v17.8h + sqadd v2.8h, v2.8h, v18.8h + sqadd v3.8h, v3.8h, v19.8h + sqadd v4.8h, v4.8h, v20.8h + sqadd v5.8h, v5.8h, v21.8h + sqadd v6.8h, v6.8h, v22.8h + sqadd v7.8h, v7.8h, v23.8h + sqxtun v0.8b, v0.8h sqxtun2 v0.16b, v1.8h - sqxtun v1.8b, v2.8h + sqxtun v1.8b, v2.8h sqxtun2 v1.16b, v3.8h - sqxtun v2.8b, v4.8h + sqxtun v2.8b, v4.8h sqxtun2 v2.16b, v5.8h - st1 {v0.16b, v1.16b}, [x0], x2 - sqxtun v3.8b, v6.8h + st1 {v0.16b, v1.16b}, [x0], x2 + sqxtun v3.8b, v6.8h sqxtun2 v3.16b, v7.8h st1 {v2.16b, v3.16b}, [x12], x2 - bne 1b + bne 1b ret endfunc function ff_hevc_add_residual_32x32_10_neon, export=1 - mov x3, #32 + mov x3, #32 movi v20.8h, #0 mvni v21.8h, #0xFC, lsl #8 // movi #0x3FF -1: subs x3, x3, #1 - ld1 {v0.8h-v3.8h}, [x1], #64 +1: subs x3, x3, #1 + ld1 {v0.8h -v3.8h}, [x1], #64 ld1 {v16.8h-v19.8h}, [x0] - sqadd v0.8h, v0.8h, v16.8h - sqadd v1.8h, v1.8h, v17.8h - sqadd v2.8h, v2.8h, v18.8h - sqadd v3.8h, v3.8h, v19.8h - clip10 v0.8h, v1.8h, v20.8h, v21.8h - clip10 v2.8h, v3.8h, v20.8h, v21.8h - st1 {v0.8h-v3.8h}, [x0], x2 - bne 1b + sqadd v0.8h, v0.8h, v16.8h + sqadd v1.8h, v1.8h, v17.8h + sqadd v2.8h, v2.8h, v18.8h + sqadd v3.8h, v3.8h, v19.8h + clip10 v0.8h, v1.8h, v20.8h, v21.8h + clip10 v2.8h, v3.8h, v20.8h, v21.8h + st1 {v0.8h-v3.8h}, [x0], x2 + bne 1b ret endfunc From patchwork Thu Jun 23 18:04:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 36407 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1a22:b0:84:42e0:ad30 with SMTP id cj34csp3624534pzb; Thu, 23 Jun 2022 11:04:43 -0700 (PDT) X-Google-Smtp-Source: AGRyM1scgJg8YriH0MEn4a5TDz3sPkQ0ZlL1X+TdRl3pNrF2Ciga+NzWQ4v5/fD3Efb6jty1iyfC X-Received: by 2002:a17:906:3989:b0:70c:2090:b498 with SMTP id h9-20020a170906398900b0070c2090b498mr9455787eje.56.1656007483280; Thu, 23 Jun 2022 11:04:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656007483; cv=none; d=google.com; s=arc-20160816; b=gSbQ2NU6hLdUWht4JAlDM/fVT5bGtPyZk1P29FP2IBh5g5ZSmPdvSO0A5XvbASIbm2 9gPg8R1tfcV2lXjvX2i4HZmv1JY/8m3hvmtKa2UYQYaQ5POBjBNN/vFCRmKxLmEdOleJ nJJcIQyeU65caufD+bXzNRTil6u6nBbEKimHAsuj+ab9hcKaX6gNYr8YbeFGXWdJ9+Ig BMQcVVVuER0W/YrrdKnegExdRDAfGlU1Wx9OX4yEcR7FeeJXqLsrG7542mIxfjW9/TRA AHXWuPPspA0fcjzzt+QPAhqINLB2iKPJQOaDEMQGrFAqM7o+AoJKBrOt0sbAEK7Akj06 dcig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:feedback-id:dkim-signature:dkim-signature:delivered-to; bh=F+qUR8Y+65HPXTiK8vergSAHA8S0oI4MtiYCDkoIXvA=; b=j4SKqWF3bz+H1zPjkx9tptTooY8X0/I31FCjuI2uXwZLDj4Kq9TSKA9dPZTeaGwhks tXwfQTcoAJIoEz7doOpGLwhb3v4mqTGMTNNGhv4vam4cprlNkmx2RDEJDkXexp7Jndg+ aKkjCK8AjLj1jvUIleT8pgQiTB43SMFGU6nOTExMPQWdz16utS/ZDIxoECMDOG655shM R+DUNw9EUOFFsoe28T9f+Mdo1p8VDTqyCcnEpan85UY+rwNccfUAcLfKfEORP1hu/D2j 0olow5Y/h+LOMwVljlr6FfGmKPdt5v5mUYAEaZZXIH1HolBB56Z7hTpWbfyamidUX+4g Pegg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=mJonFnal; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=jmR5oVY2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id hg5-20020a1709072cc500b00711f63ebdb3si19075889ejc.565.2022.06.23.11.04.42; Thu, 23 Jun 2022 11:04:43 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm3 header.b=mJonFnal; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm2 header.b=jmR5oVY2; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1136068B710; Thu, 23 Jun 2022 21:04:22 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C643368B6E2 for ; Thu, 23 Jun 2022 21:04:13 +0300 (EEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id E5C705C0059 for ; Thu, 23 Jun 2022 14:04:12 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 23 Jun 2022 14:04:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1656007452; x=1656093852; bh=+Q V7NSdROUfay3bprj18KG0zccR55qNU6jIogkm4M84=; b=mJonFnaleGOF8fe2YO uVeDcSjzBo8zt2DjcgGJqtzXJVfimbQKElH3bHcfkPlv0Escq4SM1r6TjdbR4lnN m6hWndbV3M79HP+8UPp8veVpqMKi2Ov/jsgZotN0t/sNs7VFqyIqh2eYNjf+IAHj AkkkPJFffWO+FLEgXUdB9YieGBgeDD3F8jr+YDDqtBSx0LTALwijSchEKf5svJEZ 8hs/0tENn8NW7SIn2sN6QEQEYSMOc0tCKbE0S8jsR5d7Tj4GCBv5HJWvfSRNMa6B D3RXW0JTQyakBEy9Rik1iiVZmMuSzz6KlR1vvsld/mjtVV5Hb9vWqIVOZh8nfPay MDfQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1656007452; x=1656093852; bh=+QV7NSdROUfay 3bprj18KG0zccR55qNU6jIogkm4M84=; b=jmR5oVY2ECVGkYY/ZRiqTOeXwd9Pi ZbHrSmRzvF6M9LHTzWfsDis6QWZiy2610khOMG5CBIwWoKxCOwGuW2vd+032b87B /nWtDFNCcmAuvC4sqp/I9wrIY8/SYxuvNSaoAdKVsCBZrnyQZZpV/kLdPbZjZXP3 OE9hTzpR4k46EFUQcz145GLC3VyBbssjA9P8VWaWUbf3t9v+g4BUW96V8BDR6kXn XbUhzsRNR96zcaELkL09ro8V/4mf2n5fBHTgreUOd6tuX0bxTHDZrNGOpwIUIE7q rZZXuxjeJc5PLCbmre95G/XEPkVbF/Cj4T1csG2/HHKmfTgOWHLcNsGng== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrudefjedguddvvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceojhguvghksehithgrnhhi mhhulhdrlhhiqeenucggtffrrghtthgvrhhnpefhvdefjeffgefffeeifeevgfehuedule ehhfffvedttdfhheduiedtteefheeiteenucffohhmrghinhepnhgvohhnrdhssgenucev lhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesih htrghnihhmuhhlrdhlih X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 23 Jun 2022 14:04:12 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 Jun 2022 20:04:07 +0200 Message-Id: <20220623180407.21081-3-jdek@itanimul.li> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220623180407.21081-1-jdek@itanimul.li> References: <20220623180407.21081-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: hevc_add_res add 12bit variants X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: /6QSBYS4m0ld hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S | 148 ++++++++++++---------- libavcodec/aarch64/hevcdsp_init_aarch64.c | 34 ++--- 2 files changed, 97 insertions(+), 85 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 484eea8437..413e225218 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -37,11 +37,11 @@ const trans, align=4 .short 31, 22, 13, 4 endconst -.macro clip10 in1, in2, c1, c2 - smax \in1, \in1, \c1 - smax \in2, \in2, \c1 - smin \in1, \in1, \c2 - smin \in2, \in2, \c2 +.macro clip2 in1, in2, min, max + smax \in1, \in1, \min + smax \in2, \in2, \min + smin \in1, \in1, \max + smin \in2, \in2, \max .endm function ff_hevc_add_residual_4x4_8_neon, export=1 @@ -64,25 +64,6 @@ function ff_hevc_add_residual_4x4_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_4x4_10_neon, export=1 - mov x12, x0 - ld1 {v0.8h-v1.8h}, [x1] - ld1 {v2.d}[0], [x12], x2 - ld1 {v2.d}[1], [x12], x2 - ld1 {v3.d}[0], [x12], x2 - sqadd v0.8h, v0.8h, v2.8h - ld1 {v3.d}[1], [x12], x2 - movi v4.8h, #0 - sqadd v1.8h, v1.8h, v3.8h - mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF - clip10 v0.8h, v1.8h, v4.8h, v5.8h - st1 {v0.d}[0], [x0], x2 - st1 {v0.d}[1], [x0], x2 - st1 {v1.d}[0], [x0], x2 - st1 {v1.d}[1], [x0], x2 - ret -endfunc - function ff_hevc_add_residual_8x8_8_neon, export=1 add x12, x0, x2 add x2, x2, x2 @@ -103,25 +84,6 @@ function ff_hevc_add_residual_8x8_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_8x8_10_neon, export=1 - add x12, x0, x2 - add x2, x2, x2 - mov x3, #8 - movi v4.8h, #0 - mvni v5.8h, #0xFC, lsl #8 // movi #0x3FF -1: subs x3, x3, #2 - ld1 {v0.8h-v1.8h}, [x1], #32 - ld1 {v2.8h}, [x0] - sqadd v0.8h, v0.8h, v2.8h - ld1 {v3.8h}, [x12] - sqadd v1.8h, v1.8h, v3.8h - clip10 v0.8h, v1.8h, v4.8h, v5.8h - st1 {v0.8h}, [x0], x2 - st1 {v1.8h}, [x12], x2 - bne 1b - ret -endfunc - function ff_hevc_add_residual_16x16_8_neon, export=1 mov x3, #16 add x12, x0, x2 @@ -148,28 +110,6 @@ function ff_hevc_add_residual_16x16_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_16x16_10_neon, export=1 - mov x3, #16 - movi v20.8h, #0 - mvni v21.8h, #0xFC, lsl #8 // movi #0x3FF - add x12, x0, x2 - add x2, x2, x2 -1: subs x3, x3, #2 - ld1 {v16.8h-v17.8h}, [x0] - ld1 {v0.8h-v3.8h}, [x1], #64 - sqadd v0.8h, v0.8h, v16.8h - ld1 {v18.8h-v19.8h}, [x12] - sqadd v1.8h, v1.8h, v17.8h - sqadd v2.8h, v2.8h, v18.8h - sqadd v3.8h, v3.8h, v19.8h - clip10 v0.8h, v1.8h, v20.8h, v21.8h - clip10 v2.8h, v3.8h, v20.8h, v21.8h - st1 {v0.8h-v1.8h}, [x0], x2 - st1 {v2.8h-v3.8h}, [x12], x2 - bne 1b - ret -endfunc - function ff_hevc_add_residual_32x32_8_neon, export=1 add x12, x0, x2 add x2, x2, x2 @@ -209,10 +149,76 @@ function ff_hevc_add_residual_32x32_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_32x32_10_neon, export=1 +.macro add_res bitdepth +.if \bitdepth == 10 +.set mask, 0xFC +.else +.set mask, 0xF0 +.endif +function ff_hevc_add_residual_4x4_\bitdepth\()_neon, export=1 + mov x12, x0 + ld1 {v0.8h-v1.8h}, [x1] + ld1 {v2.d}[0], [x12], x2 + ld1 {v2.d}[1], [x12], x2 + ld1 {v3.d}[0], [x12], x2 + sqadd v0.8h, v0.8h, v2.8h + ld1 {v3.d}[1], [x12], x2 + movi v4.8h, #0 + sqadd v1.8h, v1.8h, v3.8h + mvni v5.8h, mask, lsl #8 + clip2 v0.8h, v1.8h, v4.8h, v5.8h + st1 {v0.d}[0], [x0], x2 + st1 {v0.d}[1], [x0], x2 + st1 {v1.d}[0], [x0], x2 + st1 {v1.d}[1], [x0], x2 + ret +endfunc + +function ff_hevc_add_residual_8x8_\bitdepth\()_neon, export=1 + add x12, x0, x2 + add x2, x2, x2 + mov x3, #8 + movi v4.8h, #0 + mvni v5.8h, mask, lsl #8 +1: subs x3, x3, #2 + ld1 {v0.8h-v1.8h}, [x1], #32 + ld1 {v2.8h}, [x0] + sqadd v0.8h, v0.8h, v2.8h + ld1 {v3.8h}, [x12] + sqadd v1.8h, v1.8h, v3.8h + clip2 v0.8h, v1.8h, v4.8h, v5.8h + st1 {v0.8h}, [x0], x2 + st1 {v1.8h}, [x12], x2 + bne 1b + ret +endfunc + +function ff_hevc_add_residual_16x16_\bitdepth\()_neon, export=1 + mov x3, #16 + movi v20.8h, #0 + mvni v21.8h, mask, lsl #8 + add x12, x0, x2 + add x2, x2, x2 +1: subs x3, x3, #2 + ld1 {v16.8h-v17.8h}, [x0] + ld1 {v0.8h-v3.8h}, [x1], #64 + sqadd v0.8h, v0.8h, v16.8h + ld1 {v18.8h-v19.8h}, [x12] + sqadd v1.8h, v1.8h, v17.8h + sqadd v2.8h, v2.8h, v18.8h + sqadd v3.8h, v3.8h, v19.8h + clip2 v0.8h, v1.8h, v20.8h, v21.8h + clip2 v2.8h, v3.8h, v20.8h, v21.8h + st1 {v0.8h-v1.8h}, [x0], x2 + st1 {v2.8h-v3.8h}, [x12], x2 + bne 1b + ret +endfunc + +function ff_hevc_add_residual_32x32_\bitdepth\()_neon, export=1 mov x3, #32 movi v20.8h, #0 - mvni v21.8h, #0xFC, lsl #8 // movi #0x3FF + mvni v21.8h, mask, lsl #8 1: subs x3, x3, #1 ld1 {v0.8h -v3.8h}, [x1], #64 ld1 {v16.8h-v19.8h}, [x0] @@ -220,12 +226,16 @@ function ff_hevc_add_residual_32x32_10_neon, export=1 sqadd v1.8h, v1.8h, v17.8h sqadd v2.8h, v2.8h, v18.8h sqadd v3.8h, v3.8h, v19.8h - clip10 v0.8h, v1.8h, v20.8h, v21.8h - clip10 v2.8h, v3.8h, v20.8h, v21.8h + clip2 v0.8h, v1.8h, v20.8h, v21.8h + clip2 v2.8h, v3.8h, v20.8h, v21.8h st1 {v0.8h-v3.8h}, [x0], x2 bne 1b ret endfunc +.endm + +add_res 10 +add_res 12 .macro sum_sub out, in, c, op, p .ifc \op, + diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 2002530266..f37e47121e 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -25,22 +25,18 @@ #include "libavutil/aarch64/cpu.h" #include "libavcodec/hevcdsp.h" -void ff_hevc_add_residual_4x4_8_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_4x4_10_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_8x8_8_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_8x8_10_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_16x16_8_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_16x16_10_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_32x32_8_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); -void ff_hevc_add_residual_32x32_10_neon(uint8_t *_dst, int16_t *coeffs, - ptrdiff_t stride); +void ff_hevc_add_residual_4x4_8_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_4x4_10_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_4x4_12_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_8x8_8_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_8x8_10_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_8x8_12_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_16x16_8_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_16x16_10_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_16x16_12_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_32x32_8_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_32x32_10_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_add_residual_32x32_12_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); void ff_hevc_idct_8x8_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_8_neon(int16_t *coeffs, int col_limit); @@ -100,4 +96,10 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[2] = ff_hevc_idct_16x16_dc_10_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_10_neon; } + if (bit_depth == 12) { + c->add_residual[0] = ff_hevc_add_residual_4x4_12_neon; + c->add_residual[1] = ff_hevc_add_residual_8x8_12_neon; + c->add_residual[2] = ff_hevc_add_residual_16x16_12_neon; + c->add_residual[3] = ff_hevc_add_residual_32x32_12_neon; + } }