From patchwork Tue Aug 13 14:03:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 50996 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp284816vqm; Tue, 13 Aug 2024 07:03:55 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVOFBcy2N0nG4bNUNt/NnDvfpCTqi/Gu4d7udsbjfDK75wshDgF7l6m+D3k3I0KoxkHm25eLVUfdyDOVGwcq4OC@gmail.com X-Google-Smtp-Source: AGHT+IGLA6/Dt0boe2fKm/x92Gjm7RlPPWwPyfiQSd1vbt8c1tA8cMeJ1pxV3PMkiCUfNLUSTt5g X-Received: by 2002:a05:6402:3491:b0:57c:c166:ba6 with SMTP id 4fb4d7f45d1cf-5bd44c40139mr2285557a12.19.1723557835230; Tue, 13 Aug 2024 07:03:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723557835; cv=none; d=google.com; s=arc-20160816; b=w8P715c0i/A4UfN7hXL1tq3aW8SLWs78I1tXoSghZytLqvXh5Nstrp7cDJYCqfFVph Blx1xg3N2HF4Nn8vGO4E/tmnho0YLdJOTXFy9FtVntOdE0GQIhr0KucU75jIR8qNQ93N GZt2sdLuq/CE8yb6YQlGUMdSwb9y9rI0OsE+3+gK4Lb690nnpDiS1PceiFaPy+AEoXHd upB1TimeMUHE4U2C7upqCawCIWjTGQG6j9tqlS6vy3gB1nTjVbQoZx2tKppEcO328K2D FaWEY1sItHWpUbmbM+LVWvzRJeln26WAJ93/XwpBsJ8FdgVxqCnfDhMitoP7Clf22+NE 42QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from:feedback-id :dkim-signature:dkim-signature:delivered-to; bh=8HNL00ENl9pHcg8kdwOqg9CkI3NnuRk4/jVgsn3ft+s=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=EqA2edF+KoRXtVw9sS2DkICp2ynf3cWomVvs+300KfgplLo+D+5tHLs9MT4LYJraD6 LWMfzuEf9tPKveyYJ71bbbpaOcPcaIgWwkjF/zPkLnufGJM7cFJ2Bef2AbgmPAKyaYd6 yLSnX2+VpAPZZyxFYUocSpRAzu4kjtnD0Zlt2PRzPRjb9LqDTvRLhGEoR5oWxNAV2W9x V5VHyOFk7FODmbh8wopZVrNOWa+BEYOpL3WTchnQgahgoI95lFPolo4tfr82PYN8vFTq X/5Zeug8RAGzA4YmCetIR0AjvStpduZhQXoCPGW+S60OIP8q4j5zP/U/RglgY6kpgYdh fQLw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=aXNjqtbK; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=nwel2HsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bd1a63c548si4142398a12.674.2024.08.13.07.03.54; Tue, 13 Aug 2024 07:03:55 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=aXNjqtbK; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=nwel2HsL; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2DDDB68DA0C; Tue, 13 Aug 2024 17:03:50 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fhigh8-smtp.messagingengine.com (fhigh8-smtp.messagingengine.com [103.168.172.159]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D282768D52A for ; Tue, 13 Aug 2024 17:03:42 +0300 (EEST) Received: from phl-compute-06.internal (phl-compute-06.nyi.internal [10.202.2.46]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 13EA0114EA4D for ; Tue, 13 Aug 2024 10:03:41 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Tue, 13 Aug 2024 10:03:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to; s=fm1; t=1723557821; x=1723644221; bh=ubg6RB3cPlXOc55Rk2P2l H20T7oTZdWoizvb9TNI0bY=; b=aXNjqtbKr+sk+1vBEk5dv6I6y59nCLKc8LJUf TU4wvmJTvfs0YgptcJPWzB0LVakCKvidJi/9hvWKc0BsixuXqtZDXhLxnn81XFtx tWc2Z8749Z3XtRx+uZYNdtaauunXUXOnhWPW5nuCYOQYdP5wKUca5MUaxQNtYLSS jxbefR3/Www2jEXT866GzYq8faVHuJNS+pS2AvpOaJy+vvjXJeunZXZ3H07/5Yov eVgKDDqi9AaTmaOKuk7XXxYXPfPS/64fEzb4ZVhitvUc6yiS2A6hV9UwNV3V1x7I GntqOK152eDoGzhgo508wWUsGbUvr5Clct1JjfrquySIyTYVw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :message-id:mime-version:reply-to:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1723557821; x=1723644221; bh=ubg6RB3cPlXOc55Rk2P2lH20T7oT ZdWoizvb9TNI0bY=; b=nwel2HsLccHRBD0e1uVicnS7H5vTHn+hVyruy6oDOGrM C4usyxf5pZ+ZjM4yugbBZs2NGMogEhMQ3drHDMoW72L/jXdnWd6sMRV83rJs1a5K xSNtx1ud3YT2VWzoW0nK0JKgia53l4BOi/AlKiZAX0KI+WpORlt/qgV57Qf3w3Yy /NhKelzopLTccnB8LdP8Y22sc7xYDnhvOGpnms/SSVlsxkDrJZAuVuZHociBCqIJ thtUiPaE+JCtwca8s35mqRjAhiJO65tJy3KYWiFnlGqdUQnPFprmAbEzU2G0HiDH R+u+aCQZRCFhdCCH1xq/5/maSNlp2fN7z0WEOIuZxA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvffuff fkofgggfestdekredtredttdenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehjuggv khesihhtrghnihhmuhhlrdhliheqnecuggftrfgrthhtvghrnhepueetgfdtuedvjeejje dvteelffeuhedtfeetudfglefhjeeukeetvddvtdevieeinecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjhguvghksehithgrnhhimhhulhdrlh hipdhnsggprhgtphhtthhopedupdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehf fhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Tue, 13 Aug 2024 10:03:40 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:30 +0200 Message-ID: <20240813140338.143045-1-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/7] checkasm: add csv/tsv bench output X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: L3Cs00TRpydb When collecting performance information from checkasm it is common to parse the output for use in graphs to compare vs different architectures. Signed-off-by: J. Dekker --- tests/checkasm/checkasm.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 58597d3888..f82ee0864f 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -362,6 +362,8 @@ static struct { const char *cpu_flag_name; const char *test_name; int verbose; + int csv; + int tsv; volatile sig_atomic_t catch_signals; } state; @@ -586,7 +588,14 @@ static void print_benchs(CheckasmFunc *f) CheckasmPerf *p = &v->perf; if (p->iterations) { int decicycles = (10*p->cycles/p->iterations - state.nop_time) / 4; - printf("%s_%s: %d.%d\n", f->name, cpu_suffix(v->cpu), decicycles/10, decicycles%10); + if (state.csv) { + const char sep = state.tsv ? '\t' : ','; + printf("%s%c%s%c%d.%d\n", f->name, sep, + cpu_suffix(v->cpu), sep, + decicycles / 10, decicycles % 10); + } else { + printf("%s_%s: %d.%d\n", f->name, cpu_suffix(v->cpu), decicycles/10, decicycles%10); + } } } while ((v = v->next)); } @@ -829,7 +838,12 @@ static void bench_uninit(void) static int usage(const char *path) { fprintf(stderr, - "Usage: %s [--bench] [--runs=] [--test=] [--verbose] [seed]\n", + "Usage: %s [options...] [seed]\n" + " --test= Run specific test.\n" + " --bench Run benchmark.\n" + " --csv, --tsv Output benchmark results in CSV or TSV format.\n" + " --runs= Manual number of benchmark iterations to run 2**.\n" + " --verbose Increase verbosity.\n", path); return 1; } @@ -877,6 +891,10 @@ int main(int argc, char *argv[]) state.bench_pattern = ""; } else if (!strncmp(arg, "--test=", 7)) { state.test_name = arg + 7; + } else if (!strcmp(arg, "--csv")) { + state.csv = 1; state.tsv = 0; + } else if (!strcmp(arg, "--tsv")) { + state.csv = 1; state.tsv = 1; } else if (!strcmp(arg, "--verbose") || !strcmp(arg, "-v")) { state.verbose = 1; } else if (!strncmp(arg, "--runs=", 7)) { From patchwork Tue Aug 13 14:03:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 50997 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp285086vqm; Tue, 13 Aug 2024 07:04:10 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVQ/+v2IOrfwleSeO/NJ+kw61GlwFiUjCCpjEzgVqbgCK6Wv8HCg4IeXJ5NUdtuuliZHbfYf0V9QzYDt+YGaBKXPn2uNa7PYdFUGw== X-Google-Smtp-Source: AGHT+IE+15QSFjy9rNGvjBI1UicWAL897I8FAsLLev9LV03YLAHfWB0HlQprrtRwWnccf8yf/p9X X-Received: by 2002:a17:907:f194:b0:a7a:a960:99ee with SMTP id a640c23a62f3a-a80ed24e7b3mr281411966b.32.1723557849774; Tue, 13 Aug 2024 07:04:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723557849; cv=none; d=google.com; s=arc-20160816; b=PeMNGuyybHxm1WIkIUPOqju5uYQMZfkG12saPznSph8/r1Dimy/Db4nZIOV9HqKC4h wXmjBcHx4cQDkuswB3DwH/RSy8bJqGhRqO4YuMjqDQHB/5o67Ys6TNR7FY/eHu8+zpYf Ayp8+Odn7uMwjpD1rs2y7rmC4h1Fw5gG8QGf568yhf1gVqJdEI8uSpXWUaVWGw465mWn O3RWTd6nj8llsMxl10QUnW+o247JnPjFfbH23cPnUTV18mPjPdl4sEhprenQfAwaxI+j /D7fAv2Krxx3BAEepiVaFs8giP5XHHKO+ejZyX2x6DRLQcM7jUxKZXdnJCNBqBFFy90R Y6vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:feedback-id:dkim-signature:dkim-signature:delivered-to; bh=10TNryaY+3LoMoypS6bre5L3bTKlxi5TdM+KellSyiA=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=kS6gRyEWV4qkljOdIdCWJKmsI6u8wInWY7xT8eWmDT7F3xmdpvfeAIHfXsCUAkQQrL WHZyuJ4PSI814VjJW6eEwYT4GIKqIEC19toiZCUeoI2E6IxiEDUs5+GgjBbrq9KEFvnE kD2Srw0TqTNujfZB0n92O+zOn1NbcRErv+OdyMIag2JrxpFZRjQiENg1k2dtY/skDYnI Rs9MYWT0jAeouZ6d7n7D/2MaDgU+0EqwU7OJ/6kjwIW9WHe/uBRjY+eUyX1xHzkYrest 7sLUMprLPeuGZivlqpcr6hbvDc4eBaTjpv79gEsO1fsnNgJ6s0hLA6+4iM2MZd4BgSYf 6jBg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=cLcs7kM1; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=H++jkjPC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a80f3f4e9afsi97186266b.131.2024.08.13.07.04.08; Tue, 13 Aug 2024 07:04:09 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=cLcs7kM1; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=H++jkjPC; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 3316068DA3F; Tue, 13 Aug 2024 17:03:51 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fhigh8-smtp.messagingengine.com (fhigh8-smtp.messagingengine.com [103.168.172.159]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 92D4568D52A for ; Tue, 13 Aug 2024 17:03:43 +0300 (EEST) Received: from phl-compute-05.internal (phl-compute-05.nyi.internal [10.202.2.45]) by mailfhigh.nyi.internal (Postfix) with ESMTP id A0247114C2AA for ; Tue, 13 Aug 2024 10:03:42 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Tue, 13 Aug 2024 10:03:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1723557822; x=1723644222; bh=AIEgQLDfmn7uWr9Lnk21S5lq9E/d2xLmFMc+MuGSpSk=; b= cLcs7kM1Zjyd2OTfZMkPiLVVZrOpsZkxZRWj0l69bEr4DWYeHy1qzwOMBoZ9N8rO Y9xnEuYhBVhNSLLzZKnWh8qrSx6K9sqg6xs3bwkfJlnBYk6guSwCUbYEThPNknRp cbHsbH8DmHhPMC9v43BY9wd750odc3Vwt3dtqP/91sDhFKOPz6r2KNu6iDVDMC+c 8sF2YV0OKjOWN0NXscGTf6D+PDgPY4q1RZab7CHZHzVwDdJxyjht37EIcOMCwEg+ 28QAMmwYWvbEcxvlVfvpwubRbRYnDTouTsP5PTT6n55dqPlin7M/dDvlh4q85+ZG JPIh7r4OrfJjViYupPQT7Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723557822; x= 1723644222; bh=AIEgQLDfmn7uWr9Lnk21S5lq9E/d2xLmFMc+MuGSpSk=; b=H ++jkjPCbkHuujTN0DGDA5zqonGQRKgoYbyfoC5gwfAwBqnwn4jaReAsnXSQC7BLF kTRxQs+t5m0aH/HvVRhckSnYQN5Hd/9G/2KwJDgrxmimk+ALif/paxF/VHAVi4oa piVTrk0OcjuHi94/d029ViJJ8fn6Yp97zd/mytHQrDyj/1tOr6EeIsUa9irgCHki PFzaCOTfxshActdI2+IAa4F3liKwQVVh4QM3AxK340ek3sCr8+AJmtpGganelwhe dDQVAVZIJOrR0KnQZX1vxPIGiG3FVbhIDCyBScwOFsTMfqu2B3wNXdhR50TzqRBY Yu8j/WdYdLy8xVJ56JqKA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvffuff fkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceo jhguvghksehithgrnhhimhhulhdrlhhiqeenucggtffrrghtthgvrhhnpeekvedvudevfe eufffhffeluedvgeefgedtgefhhffhtdevudegfeekffffieetgfenucevlhhushhtvghr ufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrghnihhmuh hlrdhlihdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhhtphhouhhtpdhrtghpthht ohepfhhfmhhpvghgqdguvghvvghlsehffhhmphgvghdrohhrgh X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Tue, 13 Aug 2024 10:03:42 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:31 +0200 Message-ID: <20240813140338.143045-2-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/7] checkasm: improve print format X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 3XGhsB/xdysR Port dav1d's checkasm output format to FFmpeg's checkasm, includes relative speedups and aligns results. Signed-off-by: J. Dekker --- tests/checkasm/checkasm.c | 53 +++++++++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 5 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index f82ee0864f..0095758268 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -18,6 +18,31 @@ * You should have received a copy of the GNU General Public License along * with FFmpeg; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Copyright © 2018, VideoLAN and dav1d authors + * Copyright © 2018, Two Orioles, LLC + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "config.h" @@ -575,6 +600,16 @@ static int measure_nop_time(void) return nop_sum / 500; } +static inline double avg_cycles_per_call(const CheckasmPerf *const p) +{ + if (p->iterations) { + const double cycles = (double)(10 * p->cycles) / p->iterations - state.nop_time; + if (cycles > 0.0) + return cycles / 4.0; /* 4 calls per iteration */ + } + return 0.0; +} + /* Print benchmark results */ static void print_benchs(CheckasmFunc *f) { @@ -584,17 +619,25 @@ static void print_benchs(CheckasmFunc *f) /* Only print functions with at least one assembly version */ if (f->versions.cpu || f->versions.next) { CheckasmFuncVersion *v = &f->versions; + const CheckasmPerf *p = &v->perf; + const double baseline = avg_cycles_per_call(p); + double decicycles; do { - CheckasmPerf *p = &v->perf; if (p->iterations) { - int decicycles = (10*p->cycles/p->iterations - state.nop_time) / 4; + p = &v->perf; + decicycles = avg_cycles_per_call(p); if (state.csv) { const char sep = state.tsv ? '\t' : ','; - printf("%s%c%s%c%d.%d\n", f->name, sep, + printf("%s%c%s%c%.1f\n", f->name, sep, cpu_suffix(v->cpu), sep, - decicycles / 10, decicycles % 10); + decicycles / 10.0); } else { - printf("%s_%s: %d.%d\n", f->name, cpu_suffix(v->cpu), decicycles/10, decicycles%10); + const int pad_length = 10 + 50 - + printf("%s_%s:", f->name, cpu_suffix(v->cpu)); + const double ratio = decicycles ? + baseline / decicycles : 0.0; + printf("%*.1f (%5.2fx)\n", FFMAX(pad_length, 0), + decicycles / 10.0, ratio); } } } while ((v = v->next)); From patchwork Tue Aug 13 14:03:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 51001 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp291640vqm; Tue, 13 Aug 2024 07:11:16 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWuO4HlCuaFSlUsfOmhJT8aG7CCNqeteqHmnJVR1K1peHIa4LPRoiIvFyvAvr4JW6ubYxkE8ss8yJq1/1t/EYZW9akzmoCEbSM/IQ== X-Google-Smtp-Source: AGHT+IH72seIn9t4C/CU66pfXaFWLetCtvBIq9v2EZZKQFhrMP+MegeP1rG5BX5wnMDedNp4iHBA X-Received: by 2002:a05:6402:2710:b0:5a2:5bd2:ca50 with SMTP id 4fb4d7f45d1cf-5bd44c692d2mr3550266a12.25.1723558275672; Tue, 13 Aug 2024 07:11:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723558275; cv=none; d=google.com; s=arc-20160816; b=XzlYphXJ6amo09ts7O9QRKZAGeuPOcH+yfl0rZ/2LfRi/nAJZ7mw/uoHlDqucpTsbj ewXb8KCwMVrCehci2dFRBNbXXVU/waE952AzgomMkZnI4HIo76hFRRREzoRqt6sHQLH6 PMGlNtKVqXoBIFdpP0Pq4i10uqRCEI0XbEIA7x2uoCFxkckGFLJJjkJCfwxRQzv91MxF aqyHzJynyLkKO9mPU/M/KyWbScvD2aFRqxenHYuYqv48fH3BfwmNobnUBCYoA70nhBKv bX+3uTCPzGDdxd+0RkgrOuLnwRLJJqK96kIIj9aOtmSUTY+GkVLg4vzeOoS6MZnFXvZN Cy6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:feedback-id:dkim-signature:dkim-signature:delivered-to; bh=MTWyp9HqlHRY2RYG/yhJakGWOpXI/MhYdEizqk4ANjM=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=mCIrATiUaSkJsH/umTty5KFT13f61/67/zMtkN8jMShCckWWcCkbBH6rIiC7BLZqip zCaUfGjfgBZr/rDVhPMafgdzqNqebaveooVHLVzcmDLaGVsEQqySwGCbEezT2F1WfX/N m5fU1clgnpaFAdFdV5vIkl3RHmaJa+1T7/LpLbjN0W0cgtVD1uZGV+aVSz6sfP8v1O96 Drz5nm5egohdXXrqmYCwzg5vTeg95xwNTmMHCg/sY8+O6jvw+aZVRS+9wjUmvWKvzBmk Lr98QzwtysFgeBoH+bJX/nOMKahWmg7uD+7iILriLXijD6w2Ekewn55z17MDZ85tzVWa Qg7w==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=PBPY94Eq; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b="M/LgV8U2"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bd1a60f9c2si4087527a12.487.2024.08.13.07.11.15; Tue, 13 Aug 2024 07:11:15 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=PBPY94Eq; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b="M/LgV8U2"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 670A168DA36; Tue, 13 Aug 2024 17:03:52 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fout6-smtp.messagingengine.com (fout6-smtp.messagingengine.com [103.168.172.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 73F7D68DA15 for ; Tue, 13 Aug 2024 17:03:45 +0300 (EEST) Received: from phl-compute-02.internal (phl-compute-02.nyi.internal [10.202.2.42]) by mailfout.nyi.internal (Postfix) with ESMTP id 3AC2C1387A01 for ; Tue, 13 Aug 2024 10:03:44 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-02.internal (MEProxy); Tue, 13 Aug 2024 10:03:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1723557824; x= 1723644224; bh=77x91ihj/P/bOY4vzS8agC9N0DNzxp3SzHfbTGKb2aE=; b=P BPY94EqSEibKtT6i6tusVb8FluGdlIdQV82e2xzF4z2KMEG46TTpiZu3DR99w000 ZZEsWKD8YEQVTvjujctTnGFWb+oSFxHS6k1RmF43CmVrrCoNQY7Muf9sXE159r6E TEPiuZU341qAuoC8PSvDLp3xhlcG6YVO0FJQg8sqW+pwBcRSdtwgNmP1dyswmotB CfCbT1zheBpIgBN13PvfCDi5SCTNkgNfZ9Rr6VFKOrP9kLVn7ayEsL+Pfytz1wWG bZlbojIySvTrQkrujuVaiaq96vhT0+aSmUzPqkOW1Bn3h5r+JAfHUyQ+A/FkIp8O QV2bYfHufV/i216f6aqbg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:date:feedback-id:feedback-id:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1723557824; x=1723644224; bh=77x91ihj/P/bO Y4vzS8agC9N0DNzxp3SzHfbTGKb2aE=; b=M/LgV8U2FkwyPG5tfkCThDCnfyGbc bULA5Ntu+Z+cj5tReN/ZEqwldybFGap5JXkr4BTFfyXdfE3G3K0UgqCEFeK4CgLQ Qi/IDknD6h13JPHu40ZOKPuNIr/8LNp9XiP7Vc9pE+iklfHg5/acTtE/A0GZprfy 3B1DYR/0axX55XhdvLbJ7TbbD+Tng/BXbFWYcGTjBLEixy3R6O9TYcIjanzoWRRE HnMKTEakeQLqWQYVzGx5aVZ2+z4dxDfswrKIHwpsmklFVOLinhBBY9cfB340hEEr nG9J/83JAmv4GnBPryr0/o5O3+VC4iT5SmJrUueWIrN1+GdB6r8YsyRDA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvffuff fkofgjfhgggfestdekredtredttdenucfhrhhomhepfdflrdcuffgvkhhkvghrfdcuoehj uggvkhesihhtrghnihhmuhhlrdhliheqnecuggftrfgrthhtvghrnhepgedvhfffudduge ehveeikeeifeefgfevffektdehkeeifefhveeuteeufefhteetnecuvehluhhsthgvrhfu ihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjhguvghksehithgrnhhimhhulh drlhhipdhnsggprhgtphhtthhopedupdhmohguvgepshhmthhpohhuthdprhgtphhtthho pehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Tue, 13 Aug 2024 10:03:43 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:32 +0200 Message-ID: <20240813140338.143045-3-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/7] checkasm: add wildcompares for test & functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: X/vx250cJvUy Added: --test= Filter tests by glob style pattern. --bench[=] Run benchmark and optionally filter functions by glob style pattern. Example: $ ./tests/checkasm/checkasm --bench=yuva* [...] yuva420p_bgr24_8_c: 34.5 ( 1.00x) yuva420p_bgr24_8_ssse3: 31.1 ( 1.11x) yuva420p_bgr24_128_c: 310.6 ( 1.00x) yuva420p_bgr24_128_ssse3: 178.1 ( 1.74x) yuva420p_bgr24_1080_c: 2509.6 ( 1.00x) yuva420p_bgr24_1080_ssse3: 1471.5 ( 1.71x) yuva420p_bgr24_1920_c: 4462.6 ( 1.00x) yuva420p_bgr24_1920_ssse3: 2331.1 ( 1.91x) [...] Ported from dav1d. Signed-off-by: J. Dekker --- tests/checkasm/checkasm.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 0095758268..79cf39c27f 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -385,7 +385,7 @@ static struct { int cpu_flag; const char *cpu_flag_name; - const char *test_name; + const char *test_pattern; int verbose; int csv; int tsv; @@ -771,6 +771,22 @@ static void signal_handler(int s) { } #endif +/* Compares a string with a wildcard pattern. */ +static int wildstrcmp(const char *str, const char *pattern) +{ + const char *wild = strchr(pattern, '*'); + if (wild) { + const size_t len = wild - pattern; + if (strncmp(str, pattern, len)) return 1; + while (*++wild == '*'); + if (!*wild) return 0; + str += len; + while (*str && wildstrcmp(str, wild)) str++; + return !*str; + } + return strcmp(str, pattern); +} + /* Perform tests and benchmarks for the specified cpu flag if supported by the host */ static void check_cpu_flag(const char *name, int flag) { @@ -786,7 +802,7 @@ static void check_cpu_flag(const char *name, int flag) state.cpu_flag_name = name; for (i = 0; tests[i].func; i++) { - if (state.test_name && strcmp(tests[i].name, state.test_name)) + if (state.test_pattern && wildstrcmp(tests[i].name, state.test_pattern)) continue; state.current_test_name = tests[i].name; tests[i].func(); @@ -882,11 +898,12 @@ static int usage(const char *path) { fprintf(stderr, "Usage: %s [options...] [seed]\n" - " --test= Run specific test.\n" - " --bench Run benchmark.\n" - " --csv, --tsv Output benchmark results in CSV or TSV format.\n" - " --runs= Manual number of benchmark iterations to run 2**.\n" - " --verbose Increase verbosity.\n", + " --test= Filter tests by glob style pattern.\n" + " --bench[=] Run benchmark and optionally filter functions\n" + " by glob style pattern.\n" + " --csv, --tsv Print benchmark results in CSV or TSV format.\n" + " --runs= Manual number of benchmark iterations to run 2**.\n" + " --verbose Increase verbosity.\n", path); return 1; } @@ -931,9 +948,9 @@ int main(int argc, char *argv[]) state.bench_pattern = arg + 8; state.bench_pattern_len = strlen(state.bench_pattern); } else - state.bench_pattern = ""; + state.bench_pattern = "*"; } else if (!strncmp(arg, "--test=", 7)) { - state.test_name = arg + 7; + state.test_pattern = arg + 7; } else if (!strcmp(arg, "--csv")) { state.csv = 1; state.tsv = 0; } else if (!strcmp(arg, "--tsv")) { @@ -1037,7 +1054,7 @@ void *checkasm_check_func(void *func, const char *name, ...) int checkasm_bench_func(void) { return !state.num_failed && state.bench_pattern && - !strncmp(state.current_func->name, state.bench_pattern, state.bench_pattern_len); + !wildstrcmp(state.current_func->name, state.bench_pattern); } /* Indicate that the current test has failed */ From patchwork Tue Aug 13 14:03:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 51004 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp340341vqm; Tue, 13 Aug 2024 08:21:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVXp8cV+AIqZIWpfqFVkUnkbKgw2zssnBFo86An6CzYvUd3UvVoGcyRu6oBnIh7cKgTTFyUxphwAqjwntKrOseVKzpEMpGzIgojWA== X-Google-Smtp-Source: AGHT+IGPAQuwrlH27vWKJ5uoiMwfZlQhSyHc+FG8lM6rC4AL9lYaI2IRhuysJRMXFfLHXETvRVa5 X-Received: by 2002:a05:6512:12d3:b0:52d:b226:9428 with SMTP id 2adb3069b0e04-532136483demr3105712e87.6.1723562472151; Tue, 13 Aug 2024 08:21:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723562472; cv=none; d=google.com; s=arc-20160816; b=Xz+YVEnJm0MAbdlAW7ONCFwfwEZb9E5HPfFYh/LireJcfvpQl42Yeogb2drNHkvkwu KAbmbpeDDr87qG30DA0xR+NZ+Sjt2fjDHVZLX7AMhK+64Qn0QAaVD68+jw/fbO2qsjoc avx1A3mNCw0mai5em+ste6wRYjKGbe0A8jAS5h9w7NE2OBeY8FLsMEdSeGHAnn8iA1e2 8rJAINnziT7+ssHM8NWyF8e4UbZN+/2x5kynuSCKH05lQRH7DAP9WTb8PK9pc4NDa3Ts ovJYAO/K6QKoRxrEar/XHa4D8/GixOabUEKJiIfLlMVk9iPey7/bzF3/RZ83IOYo0r85 JI3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:feedback-id:dkim-signature:dkim-signature :delivered-to; bh=aBQf/vvjoYhyfb0hw99YEpnHbo9u8hxSHe0ElsnmGrM=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=lvJ12CY+8kJu9QG4J9OLtXYzbWXwDjf6FrAm4HvXp7N8fiIInlK5cdokenGdDKaH39 cLwqm5vqP+DyWEf+CNOznjXS0/lsLZmdCb+4P3TEpkiEf+No2hgmekdfxTHH5MQ9SBS6 nycGRwh0n+P/ddjmplYUS+A4i1yuzVwzi+K+vQP1WqpP7LYEt66O2F+D8JOPv6R0NcVK jwm6xq+thSOttib/zQ4ct7T77bDLmTdFI+UlYI+g7MEWW9OmF7+lPf+t4g3+OzZTuGnf gPpRLBu8hdzqTWLr2hHoB7y5YmMo2hQxdY4MKCpbNao/75UAu19Qn5iBxUcI3088PQKY F+uw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=juMg+KFN; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=leUl2ayT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-53200f055c3si2382955e87.324.2024.08.13.08.21.11; Tue, 13 Aug 2024 08:21:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=juMg+KFN; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=leUl2ayT; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9B81368DA75; Tue, 13 Aug 2024 17:03:53 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fout6-smtp.messagingengine.com (fout6-smtp.messagingengine.com [103.168.172.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1EA7D68DA36 for ; Tue, 13 Aug 2024 17:03:47 +0300 (EEST) Received: from phl-compute-01.internal (phl-compute-01.nyi.internal [10.202.2.41]) by mailfout.nyi.internal (Postfix) with ESMTP id 256F81389716; Tue, 13 Aug 2024 10:03:46 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Tue, 13 Aug 2024 10:03:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1723557826; x= 1723644226; bh=wQvpUbJkcCRQ+ilfpPGEi6LjWrvpbJWQ/o0hYCrVscc=; b=j uMg+KFN7+JfwgPIYESjBmo7vez4Od2I8dN78cMAavhWXRO3H0FmueXRkW7NvgVmS 6qfJkUxL7ZHREHNxHa3agDNuG6S6EGzYGFmfYYjZwzwNNcSGVmmtuYyquYNofJmQ cKiFA9tD2TGkVOyU4VCiEjwNX8neyAfsECVUwhH5zicbEPxmAYz+ExyUaw/Bu0kU eu3SZ4UDBcX2jpC6UmyeBgoba4eGFaOLzoMS6BeuPEMzCCLlF3iLDDPGXrUZWzgh +0P40fYEeonYxfXwVB4TK5ANSKOeSFsSOkBlbFfn5dPjXq23x/US09kZz7g5wZ01 OILqb3CUt3N3baUkF9xbg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723557826; x= 1723644226; bh=wQvpUbJkcCRQ+ilfpPGEi6LjWrvpbJWQ/o0hYCrVscc=; b=l eUl2ayT34fjdFBE/UUw1VWxlfeqfOL+G5z2HSFGNBRVAdd+EhRG273/QvWBXIYwP jcHu4kc+mmr3Ws9xi/SgdCL3h+D2PMEXw1vrmLrvipfv0yLTRYMxAxKTmQ93ATev DIYtexAhzlspEEyPOfWkCeJFyQ+wC0TEfL2Kz+yiBc1+2l8kExJHvnedDX2dB4VK YxG0R8vfDW3LokuUdxDJ+8dv5a1mr3ZmhxkHUhitSJQrsIp2axqdxPLemTZ0YlCF nDIMr5ox0iDFOH8Ct3/u6r4XsmMQoiq1En0Jz1eNA2Pb6HyUxAwgDaS9ZRLcmrPm xc5OqxrodabjyC9+QKOrg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvfevuf ffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceo jhguvghksehithgrnhhimhhulhdrlhhiqeenucggtffrrghtthgvrhhnpeelffefkeevle efffehiefhvdefheefffejffejkefgjedufeduveejffeuffdtteenucffohhmrghinhep hhdvieegrgguughpgigprhhvvhdrshgspdhhvdeigehiuggtthgprhhvvhdrshgspdhsth grrhhttghouggvpghrvhgsrdhssgdprghsmhdrshgsnecuvehluhhsthgvrhfuihiivgep tdenucfrrghrrghmpehmrghilhhfrhhomhepjhguvghksehithgrnhhimhhulhdrlhhipd hnsggprhgtphhtthhopedvpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehffhhm phgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhgpdhrtghpthhtohepghhitheshhgrrg hsnhdruggvvh X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 13 Aug 2024 10:03:45 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:33 +0200 Message-ID: <20240813140338.143045-4-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/7] avutil/riscv/asm: add stack pushing helpers X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: gjo9QbbXGJen From: Niklas Haas Instead of duplicating these common macros in every file, add them to the shared utility file. Also add a base case for sanity. --- libavcodec/riscv/h264addpx_rvv.S | 10 ---------- libavcodec/riscv/h264idct_rvv.S | 10 ---------- libavcodec/riscv/startcode_rvb.S | 10 ---------- libavutil/riscv/asm.S | 34 ++++++++++++++++++++++++++++++++ 4 files changed, 34 insertions(+), 30 deletions(-) diff --git a/libavcodec/riscv/h264addpx_rvv.S b/libavcodec/riscv/h264addpx_rvv.S index 82739881d9..cf3b742294 100644 --- a/libavcodec/riscv/h264addpx_rvv.S +++ b/libavcodec/riscv/h264addpx_rvv.S @@ -26,16 +26,6 @@ #include "libavutil/riscv/asm.S" - .macro sx rd, addr -#if (__riscv_xlen == 32) - sw \rd, \addr -#elif (__riscv_xlen == 64) - sd \rd, \addr -#else - sq \rd, \addr -#endif - .endm - func ff_h264_add_pixels4_8_rvv, zve32x lpad 0 vsetivli zero, 4, e8, mf4, ta, ma diff --git a/libavcodec/riscv/h264idct_rvv.S b/libavcodec/riscv/h264idct_rvv.S index d2f77a5b47..076935a5d5 100644 --- a/libavcodec/riscv/h264idct_rvv.S +++ b/libavcodec/riscv/h264idct_rvv.S @@ -29,16 +29,6 @@ #include "libavutil/riscv/asm.S" - .macro sx rd, addr -#if (__riscv_xlen == 32) - sw \rd, \addr -#elif (__riscv_xlen == 64) - sd \rd, \addr -#else - sq \rd, \addr -#endif - .endm - .variant_cc ff_h264_idct4_rvv func ff_h264_idct4_rvv, zve32x vsra.vi v5, v1, 1 diff --git a/libavcodec/riscv/startcode_rvb.S b/libavcodec/riscv/startcode_rvb.S index eec92d3340..c131ebdf59 100644 --- a/libavcodec/riscv/startcode_rvb.S +++ b/libavcodec/riscv/startcode_rvb.S @@ -26,16 +26,6 @@ #include "libavutil/riscv/asm.S" - .macro lx rd, addr -#if (__riscv_xlen == 32) - lw \rd, \addr -#elif (__riscv_xlen == 64) - ld \rd, \addr -#else - lq \rd, \addr -#endif - .endm - func ff_startcode_find_candidate_rvb, zbb lpad 0 add a1, a0, a1 diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index ec68a042d1..175f2a8672 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -237,3 +237,37 @@ .macro vntypei rd, rs, n=1 vwtypei \rd, \rs, -(\n) .endm + + /** + * Write an XLEN-sized register to an address. + * @param rs source register + * @param addr address to write to + */ + .macro sx rs, addr +#if (__riscv_xlen == 32) + sw \rs, \addr +#elif (__riscv_xlen == 64) + sd \rs, \addr +#elif (__riscv_xlen == 128) + sq \rs, \addr +#else +#error Unhandled value of XLEN +#endif + .endm + + /** + * Read an XLEN-sized register from an address. + * @param[out] rd destination register + * @param addr address to read from + */ + .macro lx rd, addr +#if (__riscv_xlen == 32) + lw \rd, \addr +#elif (__riscv_xlen == 64) + ld \rd, \addr +#elif (__riscv_xlen == 128) + lq \rd, \addr +#else +#error Unhandled value of XLEN +#endif + .endm From patchwork Tue Aug 13 14:03:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 50998 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp285670vqm; Tue, 13 Aug 2024 07:04:45 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW6HbxHmHxa2b19BCiAtlXKbbVNZ+2TiLJh78VcA+LP1yZsVOHISHx7Xwr0cDXWqnnZDNPhMR3GDMLh1h212oU4WCALIlNxwa4+EQ== X-Google-Smtp-Source: AGHT+IG1VRKjxRRcZQ41yVLusvmupmpjGnaGo53ZJyD8CQm81Q+LuikdTPbKk7aq2EBOWXKr4vTI X-Received: by 2002:a05:6402:e89:b0:5a1:61a7:56ce with SMTP id 4fb4d7f45d1cf-5bd44c79635mr2032290a12.35.1723557885498; Tue, 13 Aug 2024 07:04:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723557885; cv=none; d=google.com; s=arc-20160816; b=dZ/XEnC9clYLpSoYmDQV9bGNEMv+KgwEQxeAZ+jKr70qUXFnlkniwToHI4X+9dzncN R/BBe5HzYjKeFzsSXVbbsexvX0Xe+yIYbhEhckcbObPsRBgNXEgjnVDdVj+PbrEtssC3 mxp8nV1xeOVMXawMvWHiSgclHcDzys/aeJDvjLJMYFlj2n8fvqs1GmS4ZqWGv9KtANLk QEWbfptfnWbeQln5i0rcx8ODhV28nay89H9mm+OKOMwggaQ1V5c2yDTUbgnfVtiHs3F3 zTFQGtm71Oaq7ZqkaDvIZNw+ZUU03uCbU8fp/fxA3Y1Ly4yETYOkXARlNJfz3EW/Dirm iZKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:feedback-id:dkim-signature:dkim-signature :delivered-to; bh=CXepgpQE+U1L852ZcB67cK82/INdUbsLyBTHqPYkweI=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=aFu7bK8Yj81dDLLMGGV3HqpI2uTvTRTuY5zcBzlPx/Ku9WtTZcnCg14wvmLkq2KuIB RihNSvSREEjN/aQaBiL+gcGgdILXmg6aTDEVwN5CQJqsYl4A9T9mo0rcrD8VSolziwal 5nHKQ3NqevZWaD5spEMoQuShybfHBSwKEfS6b0jiqAB5L9JcE46LBsoOwm2u7HQmOSsF xCa++a0ugXiLNEekl9M6nQt3RKruI965Ajz8CmLTF+fxBaYX/UQZw4YdnBOkQYlKuYbW IkDfcRWG90Zl4XOgO8oJSV5rKY5jEmg/RxOJ6QTRuz8hYtqRx/uvKiOmw9MB2tdW4Wi9 zvjQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=PgqqEA1A; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=GoAG40r9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 4fb4d7f45d1cf-5bd1a5cbc8asi3923850a12.286.2024.08.13.07.04.45; Tue, 13 Aug 2024 07:04:45 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=PgqqEA1A; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=GoAG40r9; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DEF5968DA7F; Tue, 13 Aug 2024 17:03:54 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fhigh8-smtp.messagingengine.com (fhigh8-smtp.messagingengine.com [103.168.172.159]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B080968DA77 for ; Tue, 13 Aug 2024 17:03:48 +0300 (EEST) Received: from phl-compute-05.internal (phl-compute-05.nyi.internal [10.202.2.45]) by mailfhigh.nyi.internal (Postfix) with ESMTP id B5D451150BCB; Tue, 13 Aug 2024 10:03:47 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Tue, 13 Aug 2024 10:03:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1723557827; x= 1723644227; bh=4CiPrfQX3aAVwMBWIGmdmvl2EIYuPQpB0BYoB0/2kV8=; b=P gqqEA1A8ys6ClXROrXTAnqkA+FH5ZirHpk+nfEy0A1Li7GinlDMLhWO0L6WBthdQ KAt6rGvMMaPCmzQ/C4Huc4FXsH8vSgytZRfEdl+FPNFp+NvZ0Xt7DOD96wpfQWWP 8inLDlItROH9oAcQCvnneNg8bW4yLUjrMf+TJU9LzwVafEOEG9pQ0o7EzzpNZ1mF mcbmo+rQ4tCdiYnFzh3XXvT7N1/cLN3yaAnjoNgNsFOZs3SWBku06c+IfmojjOao a5wpxFxmlrIgzx/8nhV121qylnNm2dM0WeqAG6cchS71207VdvCqMuD+PFc/EeB1 xWs9FzHuPd/lhbdbNkORg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723557827; x= 1723644227; bh=4CiPrfQX3aAVwMBWIGmdmvl2EIYuPQpB0BYoB0/2kV8=; b=G oAG40r9TsSbsEk73OlKJLTv0uK780uR4zIw/hdkURCVKPQrY7f5p27zImWDpiM28 W0zHSJfXDy8cqRW3INrz/LqiSCrmgYB8rkB3OlfPjFReN8zWNi458V8iUMNa+z8T sVLTPpQftI3Ac02UMm71CjkBgK8auJSZc4Vi9+WNvV6TzPivqGzDpKUkn54P8HRq XV9mdS1RIziX+JItehkoh85SUO6gETXO+0mpKMDaCkLZ7iY59Cz803XSJFO98I3+ XB29J95FczyfeOdHa+MPi5wepaAZcwTvT0oS6C5j68kC8QKg9EmssnCOQ4PDKEPE TPUDL+q4ESE3ryDzka0jg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvfevuf ffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceo jhguvghksehithgrnhhimhhulhdrlhhiqeenucggtffrrghtthgvrhhnpedtheekffejgf ektdehgedvieevgeegffdvvedvudeffffhteeiffelhfekvddujeenucffohhmrghinhep rghsmhdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepjhguvghksehithgrnhhimhhulhdrlhhipdhnsggprhgtphhtthhopedvpdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvg hgrdhorhhgpdhrtghpthhtohepghhitheshhgrrghsnhdruggvvh X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 13 Aug 2024 10:03:47 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:34 +0200 Message-ID: <20240813140338.143045-5-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 5/7] avutil/riscv/asm: add helper macro to count varargs X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: +Xm8XO4avy95 From: Niklas Haas (Ab)using nested macros to get the number of arguments passed to a variadic macro. Useful for stack manipulation. --- libavutil/riscv/asm.S | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 175f2a8672..db190e99ca 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -271,3 +271,20 @@ #error Unhandled value of XLEN #endif .endm + + .macro count_args_inner num, arg, args:vararg + .ifb \arg + .equ num_args, \num + .else + count_args_inner \num + 1, \args + .endif + .endm + + /** + * Helper macro to count the number of arguments to a macro. Assigns + * the count to the symbol `num_args`. + * @param args arguments to count + */ + .macro count_args args:vararg + count_args_inner 0, \args + .endm From patchwork Tue Aug 13 14:03:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 51002 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp291698vqm; Tue, 13 Aug 2024 07:11:19 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCW9rasBtG5ggfoSArFMhTRc61Ffq1ZasdxCMMD/afZO3YgIpgOzyWu6j95bZqKgOsM5sslgBPpaOOXWCd89T0tBZZDxiQnYiafw/w== X-Google-Smtp-Source: AGHT+IHk+JijtgmaVk7KSqDwVL6+a5tx04srkyaaqF00JTPxh8YYrgJ4PWfiCH7862ysF4NynUTt X-Received: by 2002:a05:6512:3b90:b0:52f:c337:4c1f with SMTP id 2adb3069b0e04-53214f8272cmr1010196e87.0.1723558278876; Tue, 13 Aug 2024 07:11:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723558278; cv=none; d=google.com; s=arc-20160816; b=UemrRORlQ4d08FDmRyGnlgjMMm0zSoK8rRk709tTTuYVOFDQuewjJMT/KQeJ0+ePAA CX2Ijk6nmr1JPq+rJsDEG8BDJZWVKgxoS7GfCRtUyw0rtv8MZI90rDoUI3Gefus95vmf DZc0HLfq82GoocpmruqMb9nKXoYNGLrJhEtOt3WGuelFFvrR7L5jUbdZZMZYHGVNlMHd LB2e0+J/u7OBVe8LzmdFb+nm5nnE2HssNYA7nCA2fqSigTecBO1RBsTcbKq05msu9KMo ZL3ffUaLNNGwLc6dS9tkF+3ahCpyu14AAQjcOUfLRLRkUhv8l+2ntgsoVUybOoNPoBu3 ew0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:feedback-id:dkim-signature:dkim-signature :delivered-to; bh=ZN4pzCwW6B/Yf2BHsiHtj6c7YlA927+0tw7jl3F8Ows=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=EgbWehRc1L7Yw61WN70JBOlF+5z76OW3yQEOhbJmouuTXSQ97ZkJFLk0KX8E2KLzGC nSdHx/FXZ0N7mHFVW+FCDTQjm2Z5DSNXEDqtUJMs3f3lfGzffQ1ZXe6TTps43PZlpG8H sX1DFCiuS4X7+IJRf84gXGc5E9qHjvd/K18aPqwVqlpNn/cXTpLUm+7EU30Ec9/JQdiJ /4SqEtZGm2xg2sQpP0Xo4zoYa+GOZ+f1kp95sP42jpNYsnZh2k2tetNWhNpTRThaTqOB Bc/IR6yVYZ43g51pb8/fZoo8y0rRuTWMzElNGuJBoIin3sEZaaGcLV/B4YnLxmRuKqGK VjLQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=Rcd6xKtn; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=hErl+yBr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id 2adb3069b0e04-53200e96793si2321176e87.84.2024.08.13.07.11.17; Tue, 13 Aug 2024 07:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=Rcd6xKtn; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=hErl+yBr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4666868DA9E; Tue, 13 Aug 2024 17:03:58 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fout6-smtp.messagingengine.com (fout6-smtp.messagingengine.com [103.168.172.149]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4321E68DA82 for ; Tue, 13 Aug 2024 17:03:50 +0300 (EEST) Received: from phl-compute-05.internal (phl-compute-05.nyi.internal [10.202.2.45]) by mailfout.nyi.internal (Postfix) with ESMTP id 514D3138C049; Tue, 13 Aug 2024 10:03:49 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Tue, 13 Aug 2024 10:03:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1723557829; x= 1723644229; bh=xJnkRckIVBt5baXbbD+rW08szHc3SpSU3qFXUvdBY1I=; b=R cd6xKtniHEKVaQ2+qpjyF4AqnVwmxvKVMtXpH9J8ZRTWjvuP24Ky27NkVx8gOI6O 31m8Ax4n6BSMnz1Ul6E4+ZctRZuIBCETrjzvJejngGrIXUWypnI/D2PHghtP2aIK 7wTDPmfaap+iDiMR2tbBJyIVcU9LEEzvg2aFx78eVd7LrTQXTxLfvxh8c74wa6Gw vEL3JYmDhmsa/kyyiMpmqZjEAg3mtqV7C79UM8Fjl0oVkKF1s8ipwB51cD6V1pW/ ksqztOYvf/2IgUhA7hRgF/UoA7c6iW0sO3sljzLfG0U2OxQmfOpH1FSpdjgYDFAE zLlXhxX0JHfLOJuZcZFmA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723557829; x= 1723644229; bh=xJnkRckIVBt5baXbbD+rW08szHc3SpSU3qFXUvdBY1I=; b=h Erl+yBrMtxHgF7UjUoJ/OndLc9opUE6l1KpS8D0t/IznL/5RgxfY5JQrVZnhhhLq sIa6IXS1D/ngyL5kB0yZc6QYjqZ0jb+Apv3NAnDL4goAUnG3oxltDHKk+PYUQq2o iXhytqpLwrnXf4j53+Z9U9Z8SfklEtNLaZxZOwPZKFTjd3hGfIn9MXDh1bFdTsA0 XZrDdoM7liI9IWiZ9i98ivpLRItApQmCUtw/72LfdLZkj/SvWRcnBi85nCmEkr56 0KE/9Yv9HrlWox7pdr0wJPy3IgqDIAivtyOF4Bvld3EBZSHDPNseJk/1CTL4lG/I sHklwLW/KYt1+6NqjTqmw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvfevuf ffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceo jhguvghksehithgrnhhimhhulhdrlhhiqeenucggtffrrghtthgvrhhnpedtheekffejgf ektdehgedvieevgeegffdvvedvudeffffhteeiffelhfekvddujeenucffohhmrghinhep rghsmhdrshgsnecuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrh homhepjhguvghksehithgrnhhimhhulhdrlhhipdhnsggprhgtphhtthhopedvpdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvg hgrdhorhhgpdhrtghpthhtohepghhitheshhgrrghsnhdruggvvh X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 13 Aug 2024 10:03:48 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:35 +0200 Message-ID: <20240813140338.143045-6-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 6/7] avutil/riscv/asm: add generic push/pop helpers X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: k15Hc+Go0kiC From: Niklas Haas Generic helper macros to push/pop multiple registers at once. Expands to a single `addi` plus a sequence of XLEN-sized stores/loads. --- libavutil/riscv/asm.S | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index db190e99ca..3955530e4e 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -288,3 +288,40 @@ .macro count_args args:vararg count_args_inner 0, \args .endm + + /** + * Helper macro to iterate over constant sized elements in memory + * @param op operation to perform on each element (sized load/store) + * @param size size in bytes per element + * @param offset starting offset of first element + * @param addr base address to load/store + * @param regs registers to iterate over + */ + .macro for_mem op, size, offset, addr, reg, regs:vararg + .ifnb \reg + \op \reg, \offset(\addr) + for_mem \op, \size, \offset + \size, \addr, \regs + .endif + .endm + + /** + * Push a variable number of registers to the stack. + * @param n number of registers to push + * @param regs registers to push + */ + .macro push regs:vararg + count_args \regs + addi sp, sp, -(num_args * (__riscv_xlen >> 3)) + for_mem sx, __riscv_xlen >> 3, 0, sp, \regs + .endm + + /** + * Pop a variable number of registers from the stack. + * @param n number of registers to pop + * @param[out] regs registers to pop + */ + .macro pop regs:vararg + count_args \regs + for_mem lx, __riscv_xlen >> 3, 0, sp, \regs + addi sp, sp, num_args * (__riscv_xlen >> 3) + .endm From patchwork Tue Aug 13 14:03:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Dekker" X-Patchwork-Id: 51003 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a59:a746:0:b0:489:2eb3:e4c4 with SMTP id f6csp299145vqm; Tue, 13 Aug 2024 07:21:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVSGERzu4t8PV0ChKUEohzYB4O2qeE2mg6ftnH6oNlSGepYH/I2mdMt6twwgV83Nx10qx6abxy+Ekm99i0XS2ZsXV9i+jjxX3ztEw== X-Google-Smtp-Source: AGHT+IHF3r9F3S3itcnP3uUbooIQIBWr1626KeBLQ1FeNA+JwhoDPggEdexoRxdjq0mKds4aDjTr X-Received: by 2002:a17:906:fd87:b0:a77:b726:4fc with SMTP id a640c23a62f3a-a80ed1a86f1mr305801766b.1.1723558873886; Tue, 13 Aug 2024 07:21:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1723558873; cv=none; d=google.com; s=arc-20160816; b=prfIQzi5EWD2qy8wYgeguTxXWHabDfqUia8nEV3O185uWUsS8QOkYBhux3BZpQfCDD 0rmVymwogjFbS9Y9hopNX/eoK3I2I61QA791kVmKsQRqTU8P6ny7kUsviAvEZeLJuTDW KlRLt6yNFo7mzy1uUmjuAS8rej1Ly+ymFpz8G/jNjbhYGwGDdiHv5l2aiWgxQZ4V8Rfr a8mQ35FQJkQLjzuw6+8P8c/ni3e5omH0IbL3dUsil6rc7eLiAv7tXe6CMnjIPvoXom3H v+aSiNCZwQOzPWZ2Rf7HqdzVf8jstV6ePHq5ltJraPhMfmrOvjC46j+/jnwt9EpAgG3N eajw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:references:in-reply-to :message-id:date:to:from:feedback-id:dkim-signature:dkim-signature :delivered-to; bh=u11HhoaFUErNJRHEIWR5GXFG1SiOZktZoUdpG8xs9Nw=; fh=xmAeKtysnShNOmkhiJmYkS30uw4Fu2hvBJ7qlIwukxQ=; b=lCoQda5Oz7Nb8Bg+y76vGotVHMl7+ieuP6rkdd6A3wq4zvUOgchYB973N+IynQHJs9 Cf7kSBruNQhUEhvryew7nkFgRFfw4Pk2PfGzjvGZV1ig0l5nd7IT5Pkiffgs+9yeKcGE 6enMHNczmQvYvRNZpXw3WlixSHETxobdDQDqHtisM/824YjAqXVF/Tzld3vc37Fy/uvB NbJzXRa1a0ASXlaehmfnldGTdp6BWsOnPHf4oI8lPjzmpp+NstcCtMxa5v2AS8I1+u6E Xf9RYggJOGiXxtDUkCiJo45Hf6T0aFFj1vFTS3jJt7uCompPsAqRZukXqJ71ZNIuuqql Iwyw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=C+QoJQEX; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=hgPlKy4a; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id a640c23a62f3a-a80f41ce32esi91355966b.1049.2024.08.13.07.21.13; Tue, 13 Aug 2024 07:21:13 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@itanimul.li header.s=fm1 header.b=C+QoJQEX; dkim=neutral (body hash did not verify) header.i=@messagingengine.com header.s=fm3 header.b=hgPlKy4a; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 116AA68DA65; Tue, 13 Aug 2024 17:03:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from fhigh8-smtp.messagingengine.com (fhigh8-smtp.messagingengine.com [103.168.172.159]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D06E068DA7E for ; Tue, 13 Aug 2024 17:03:52 +0300 (EEST) Received: from phl-compute-07.internal (phl-compute-07.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 18B71114C2AA; Tue, 13 Aug 2024 10:03:51 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-07.internal (MEProxy); Tue, 13 Aug 2024 10:03:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itanimul.li; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1723557831; x= 1723644231; bh=ynyg0LNI8yF7hWC399sliNxmOS0gMkTBZ3UEYHfNi4Y=; b=C +QoJQEXfup0CKpn0880rjE4ztTsq2/n4btcbsHeRc+p/8Qb+SsT+Io/KKRhDEKdM 2ZLXT7cE9mVQMuVBF5ut6VCtn+HdSYUzFjeY/13FUIhmsSZAv8fGIYQzhQXWBc3F Hhtid2G9a4KQlQ9PKhPN2BS3BpG7Z2dFk19auFC6GMEWcyCNaaS49bFa4sod5EDo +bgEr1kAY/jRJn0qwL1GvsAtDxCSqCa8s4ZyOL8zIm1YGGdzi80EC8EA1nGNcOMr x85Sn67gP7pT9vkcsFokCi+zpJNA7MMTbRoahKQR9nW42Q3FRZ0a2/jT7TrxbpNU zBzd5hAPnJWmZj/3zSUVQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1723557831; x= 1723644231; bh=ynyg0LNI8yF7hWC399sliNxmOS0gMkTBZ3UEYHfNi4Y=; b=h gPlKy4aKrK6WddNcz1R3mRnTEfNNtpp8BHybPu+ZzgWQyHLh1kQDiFiG7iikYfdI hyiKprSMSN8uLL+umeI1qC6iNSinFG8ipaXVXkpZtXqhPSIH1vk1cW+9fGDzTduP dCUpcbAgCoH2VkZfLcGm6dgtkOrltbkH/nfZ6bpzY3XI4YjSiZF5QJ0OINy6Chgh vgDqNXNYbZpUoe7JGWdPTQnFDZzlP0YZYjFVXVkvRwHKs+3HWnW165eyQFvbqkfJ 2u4z3v3dA33EV1U1td/Zj5qKOn3nAglsSJYKTUKU+F50ajMP9Wxut/jP9GIJz2z0 ciLRlIDcXcGaqpz9jwXHg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddruddtvddgjedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvfevuf ffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpedflfdrucffvghkkhgvrhdfuceo jhguvghksehithgrnhhimhhulhdrlhhiqeenucggtffrrghtthgvrhhnpeeivedufeegfe fgvefhlefgffetjeduveeffeegleeiteeujedvueehhfekgeelgeenucffohhmrghinhep hhdvieegqhhpvghlpghrvhhvrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpehjuggvkhesihhtrghnihhmuhhlrdhlihdpnhgspghrtghp thhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepfhhfmhhpvghgqdguvg hvvghlsehffhhmphgvghdrohhrghdprhgtphhtthhopehgihhtsehhrggrshhnrdguvghv X-ME-Proxy: Feedback-ID: i84994747:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 13 Aug 2024 10:03:50 -0400 (EDT) From: "J. Dekker" To: ffmpeg-devel@ffmpeg.org Date: Tue, 13 Aug 2024 16:03:36 +0200 Message-ID: <20240813140338.143045-7-jdek@itanimul.li> X-Mailer: git-send-email 2.44.1 In-Reply-To: <20240813140338.143045-1-jdek@itanimul.li> References: <20240813140338.143045-1-jdek@itanimul.li> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 7/7] avcodec/riscv: add h264 qpel X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Niklas Haas Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: BZo0QKtInlHa From: Niklas Haas checkasm: bench runs 131072 (1 << 17) avg_h264_qpel_4_mc00_8_c: 37.6 ( 1.00x) avg_h264_qpel_4_mc00_8_rvv_i32: 27.4 ( 1.37x) avg_h264_qpel_4_mc01_8_c: 214.6 ( 1.00x) avg_h264_qpel_4_mc01_8_rvv_i32: 79.3 ( 2.70x) avg_h264_qpel_4_mc02_8_c: 214.8 ( 1.00x) avg_h264_qpel_4_mc02_8_rvv_i32: 79.3 ( 2.71x) avg_h264_qpel_4_mc03_8_c: 214.8 ( 1.00x) avg_h264_qpel_4_mc03_8_rvv_i32: 79.3 ( 2.71x) avg_h264_qpel_4_mc10_8_c: 173.1 ( 1.00x) avg_h264_qpel_4_mc10_8_rvv_i32: 120.8 ( 1.43x) avg_h264_qpel_4_mc11_8_c: 339.9 ( 1.00x) avg_h264_qpel_4_mc11_8_rvv_i32: 183.3 ( 1.85x) avg_h264_qpel_4_mc12_8_c: 537.6 ( 1.00x) avg_h264_qpel_4_mc12_8_rvv_i32: 339.9 ( 1.58x) avg_h264_qpel_4_mc13_8_c: 339.9 ( 1.00x) avg_h264_qpel_4_mc13_8_rvv_i32: 194.1 ( 1.75x) avg_h264_qpel_4_mc20_8_c: 141.8 ( 1.00x) avg_h264_qpel_4_mc20_8_rvv_i32: 121.1 ( 1.17x) avg_h264_qpel_4_mc21_8_c: 485.6 ( 1.00x) avg_h264_qpel_4_mc21_8_rvv_i32: 381.4 ( 1.27x) avg_h264_qpel_4_mc22_8_c: 350.1 ( 1.00x) avg_h264_qpel_4_mc22_8_rvv_i32: 266.9 ( 1.31x) avg_h264_qpel_4_mc23_8_c: 485.6 ( 1.00x) avg_h264_qpel_4_mc23_8_rvv_i32: 381.6 ( 1.27x) avg_h264_qpel_4_mc30_8_c: 173.1 ( 1.00x) avg_h264_qpel_4_mc30_8_rvv_i32: 131.6 ( 1.32x) avg_h264_qpel_4_mc31_8_c: 339.9 ( 1.00x) avg_h264_qpel_4_mc31_8_rvv_i32: 183.3 ( 1.85x) avg_h264_qpel_4_mc32_8_c: 537.9 ( 1.00x) avg_h264_qpel_4_mc32_8_rvv_i32: 339.9 ( 1.58x) avg_h264_qpel_4_mc33_8_c: 339.9 ( 1.00x) avg_h264_qpel_4_mc33_8_rvv_i32: 193.8 ( 1.75x) avg_h264_qpel_8_mc00_8_c: 110.6 ( 1.00x) avg_h264_qpel_8_mc00_8_rvv_i32: 48.1 ( 2.30x) avg_h264_qpel_8_mc01_8_c: 766.9 ( 1.00x) avg_h264_qpel_8_mc01_8_rvv_i32: 152.1 ( 5.04x) avg_h264_qpel_8_mc02_8_c: 766.9 ( 1.00x) avg_h264_qpel_8_mc02_8_rvv_i32: 141.8 ( 5.41x) avg_h264_qpel_8_mc03_8_c: 777.4 ( 1.00x) avg_h264_qpel_8_mc03_8_rvv_i32: 152.3 ( 5.10x) avg_h264_qpel_8_mc10_8_c: 620.9 ( 1.00x) avg_h264_qpel_8_mc10_8_rvv_i32: 235.6 ( 2.64x) avg_h264_qpel_8_mc11_8_c: 1204.6 ( 1.00x) avg_h264_qpel_8_mc11_8_rvv_i32: 360.6 ( 3.34x) avg_h264_qpel_8_mc12_8_c: 1912.6 ( 1.00x) avg_h264_qpel_8_mc12_8_rvv_i32: 558.4 ( 3.43x) avg_h264_qpel_8_mc13_8_c: 1214.6 ( 1.00x) avg_h264_qpel_8_mc13_8_rvv_i32: 360.6 ( 3.37x) avg_h264_qpel_8_mc20_8_c: 506.4 ( 1.00x) avg_h264_qpel_8_mc20_8_rvv_i32: 225.1 ( 2.25x) avg_h264_qpel_8_mc21_8_c: 1714.8 ( 1.00x) avg_h264_qpel_8_mc21_8_rvv_i32: 631.6 ( 2.72x) avg_h264_qpel_8_mc22_8_c: 1266.8 ( 1.00x) avg_h264_qpel_8_mc22_8_rvv_i32: 423.1 ( 2.99x) avg_h264_qpel_8_mc23_8_c: 1714.6 ( 1.00x) avg_h264_qpel_8_mc23_8_rvv_i32: 631.4 ( 2.72x) avg_h264_qpel_8_mc30_8_c: 610.6 ( 1.00x) avg_h264_qpel_8_mc30_8_rvv_i32: 235.6 ( 2.59x) avg_h264_qpel_8_mc31_8_c: 1214.6 ( 1.00x) avg_h264_qpel_8_mc31_8_rvv_i32: 350.1 ( 3.47x) avg_h264_qpel_8_mc32_8_c: 1902.3 ( 1.00x) avg_h264_qpel_8_mc32_8_rvv_i32: 558.6 ( 3.41x) avg_h264_qpel_8_mc33_8_c: 1214.8 ( 1.00x) avg_h264_qpel_8_mc33_8_rvv_i32: 360.6 ( 3.37x) avg_h264_qpel_16_mc00_8_c: 423.1 ( 1.00x) avg_h264_qpel_16_mc00_8_rvv_i32: 68.8 ( 6.15x) avg_h264_qpel_16_mc01_8_c: 2850.1 ( 1.00x) avg_h264_qpel_16_mc01_8_rvv_i32: 298.1 ( 9.56x) avg_h264_qpel_16_mc02_8_c: 2954.6 ( 1.00x) avg_h264_qpel_16_mc02_8_rvv_i32: 277.4 (10.65x) avg_h264_qpel_16_mc03_8_c: 2871.1 ( 1.00x) avg_h264_qpel_16_mc03_8_rvv_i32: 298.1 ( 9.63x) avg_h264_qpel_16_mc10_8_c: 2423.1 ( 1.00x) avg_h264_qpel_16_mc10_8_rvv_i32: 464.9 ( 5.21x) avg_h264_qpel_16_mc11_8_c: 4683.6 ( 1.00x) avg_h264_qpel_16_mc11_8_rvv_i32: 714.6 ( 6.55x) avg_h264_qpel_16_mc12_8_c: 7496.4 ( 1.00x) avg_h264_qpel_16_mc12_8_rvv_i32: 1037.6 ( 7.22x) avg_h264_qpel_16_mc13_8_c: 4642.1 ( 1.00x) avg_h264_qpel_16_mc13_8_rvv_i32: 704.4 ( 6.59x) avg_h264_qpel_16_mc20_8_c: 2069.1 ( 1.00x) avg_h264_qpel_16_mc20_8_rvv_i32: 443.9 ( 4.66x) avg_h264_qpel_16_mc21_8_c: 6808.6 ( 1.00x) avg_h264_qpel_16_mc21_8_rvv_i32: 1204.3 ( 5.65x) avg_h264_qpel_16_mc22_8_c: 5048.4 ( 1.00x) avg_h264_qpel_16_mc22_8_rvv_i32: 777.4 ( 6.49x) avg_h264_qpel_16_mc23_8_c: 6819.1 ( 1.00x) avg_h264_qpel_16_mc23_8_rvv_i32: 1214.8 ( 5.61x) avg_h264_qpel_16_mc30_8_c: 2412.8 ( 1.00x) avg_h264_qpel_16_mc30_8_rvv_i32: 464.9 ( 5.19x) avg_h264_qpel_16_mc31_8_c: 4662.9 ( 1.00x) avg_h264_qpel_16_mc31_8_rvv_i32: 714.6 ( 6.53x) avg_h264_qpel_16_mc32_8_c: 7516.9 ( 1.00x) avg_h264_qpel_16_mc32_8_rvv_i32: 1058.6 ( 7.10x) avg_h264_qpel_16_mc33_8_c: 4673.4 ( 1.00x) avg_h264_qpel_16_mc33_8_rvv_i32: 714.9 ( 6.54x) put_h264_qpel_4_mc00_8_c: 27.4 ( 1.00x) put_h264_qpel_4_mc00_8_rvv_i32: 16.9 ( 1.62x) put_h264_qpel_4_mc01_8_c: 214.6 ( 1.00x) put_h264_qpel_4_mc01_8_rvv_i32: 79.3 ( 2.70x) put_h264_qpel_4_mc02_8_c: 183.3 ( 1.00x) put_h264_qpel_4_mc02_8_rvv_i32: 79.3 ( 2.31x) put_h264_qpel_4_mc03_8_c: 204.3 ( 1.00x) put_h264_qpel_4_mc03_8_rvv_i32: 89.6 ( 2.28x) put_h264_qpel_4_mc10_8_c: 173.1 ( 1.00x) put_h264_qpel_4_mc10_8_rvv_i32: 120.8 ( 1.43x) put_h264_qpel_4_mc11_8_c: 339.6 ( 1.00x) put_h264_qpel_4_mc11_8_rvv_i32: 183.3 ( 1.85x) put_h264_qpel_4_mc12_8_c: 527.4 ( 1.00x) put_h264_qpel_4_mc12_8_rvv_i32: 339.9 ( 1.55x) put_h264_qpel_4_mc13_8_c: 329.4 ( 1.00x) put_h264_qpel_4_mc13_8_rvv_i32: 183.6 ( 1.79x) put_h264_qpel_4_mc20_8_c: 121.1 ( 1.00x) put_h264_qpel_4_mc20_8_rvv_i32: 110.6 ( 1.09x) put_h264_qpel_4_mc21_8_c: 464.6 ( 1.00x) put_h264_qpel_4_mc21_8_rvv_i32: 371.1 ( 1.25x) put_h264_qpel_4_mc22_8_c: 329.4 ( 1.00x) put_h264_qpel_4_mc22_8_rvv_i32: 256.4 ( 1.28x) put_h264_qpel_4_mc23_8_c: 475.1 ( 1.00x) put_h264_qpel_4_mc23_8_rvv_i32: 371.1 ( 1.28x) put_h264_qpel_4_mc30_8_c: 162.6 ( 1.00x) put_h264_qpel_4_mc30_8_rvv_i32: 121.1 ( 1.34x) put_h264_qpel_4_mc31_8_c: 339.9 ( 1.00x) put_h264_qpel_4_mc31_8_rvv_i32: 183.6 ( 1.85x) put_h264_qpel_4_mc32_8_c: 527.1 ( 1.00x) put_h264_qpel_4_mc32_8_rvv_i32: 339.9 ( 1.55x) put_h264_qpel_4_mc33_8_c: 339.9 ( 1.00x) put_h264_qpel_4_mc33_8_rvv_i32: 183.3 ( 1.85x) put_h264_qpel_8_mc00_8_c: 89.8 ( 1.00x) put_h264_qpel_8_mc00_8_rvv_i32: 37.6 ( 2.39x) put_h264_qpel_8_mc01_8_c: 725.1 ( 1.00x) put_h264_qpel_8_mc01_8_rvv_i32: 141.8 ( 5.11x) put_h264_qpel_8_mc02_8_c: 662.9 ( 1.00x) put_h264_qpel_8_mc02_8_rvv_i32: 131.3 ( 5.05x) put_h264_qpel_8_mc03_8_c: 735.6 ( 1.00x) put_h264_qpel_8_mc03_8_rvv_i32: 141.8 ( 5.19x) put_h264_qpel_8_mc10_8_c: 600.4 ( 1.00x) put_h264_qpel_8_mc10_8_rvv_i32: 225.1 ( 2.67x) put_h264_qpel_8_mc11_8_c: 1173.1 ( 1.00x) put_h264_qpel_8_mc11_8_rvv_i32: 339.9 ( 3.45x) put_h264_qpel_8_mc12_8_c: 1871.1 ( 1.00x) put_h264_qpel_8_mc12_8_rvv_i32: 548.1 ( 3.41x) put_h264_qpel_8_mc13_8_c: 1173.1 ( 1.00x) put_h264_qpel_8_mc13_8_rvv_i32: 339.9 ( 3.45x) put_h264_qpel_8_mc20_8_c: 454.6 ( 1.00x) put_h264_qpel_8_mc20_8_rvv_i32: 214.8 ( 2.12x) put_h264_qpel_8_mc21_8_c: 1683.6 ( 1.00x) put_h264_qpel_8_mc21_8_rvv_i32: 621.1 ( 2.71x) put_h264_qpel_8_mc22_8_c: 1162.6 ( 1.00x) put_h264_qpel_8_mc22_8_rvv_i32: 412.9 ( 2.82x) put_h264_qpel_8_mc23_8_c: 1673.3 ( 1.00x) put_h264_qpel_8_mc23_8_rvv_i32: 631.4 ( 2.65x) put_h264_qpel_8_mc30_8_c: 589.9 ( 1.00x) put_h264_qpel_8_mc30_8_rvv_i32: 225.3 ( 2.62x) put_h264_qpel_8_mc31_8_c: 1173.1 ( 1.00x) put_h264_qpel_8_mc31_8_rvv_i32: 339.9 ( 3.45x) put_h264_qpel_8_mc32_8_c: 1871.1 ( 1.00x) put_h264_qpel_8_mc32_8_rvv_i32: 548.1 ( 3.41x) put_h264_qpel_8_mc33_8_c: 1162.6 ( 1.00x) put_h264_qpel_8_mc33_8_rvv_i32: 350.1 ( 3.32x) put_h264_qpel_16_mc00_8_c: 308.6 ( 1.00x) put_h264_qpel_16_mc00_8_rvv_i32: 48.1 ( 6.42x) put_h264_qpel_16_mc01_8_c: 2746.1 ( 1.00x) put_h264_qpel_16_mc01_8_rvv_i32: 277.4 ( 9.90x) put_h264_qpel_16_mc02_8_c: 2558.6 ( 1.00x) put_h264_qpel_16_mc02_8_rvv_i32: 266.9 ( 9.59x) put_h264_qpel_16_mc03_8_c: 2756.6 ( 1.00x) put_h264_qpel_16_mc03_8_rvv_i32: 277.4 ( 9.94x) put_h264_qpel_16_mc10_8_c: 2287.8 ( 1.00x) put_h264_qpel_16_mc10_8_rvv_i32: 443.9 ( 5.15x) put_h264_qpel_16_mc11_8_c: 4558.6 ( 1.00x) put_h264_qpel_16_mc11_8_rvv_i32: 683.4 ( 6.67x) put_h264_qpel_16_mc12_8_c: 7381.9 ( 1.00x) put_h264_qpel_16_mc12_8_rvv_i32: 1027.1 ( 7.19x) put_h264_qpel_16_mc13_8_c: 4548.4 ( 1.00x) put_h264_qpel_16_mc13_8_rvv_i32: 683.6 ( 6.65x) put_h264_qpel_16_mc20_8_c: 1819.1 ( 1.00x) put_h264_qpel_16_mc20_8_rvv_i32: 423.4 ( 4.30x) put_h264_qpel_16_mc21_8_c: 6704.6 ( 1.00x) put_h264_qpel_16_mc21_8_rvv_i32: 1183.6 ( 5.66x) put_h264_qpel_16_mc22_8_c: 4641.9 ( 1.00x) put_h264_qpel_16_mc22_8_rvv_i32: 756.4 ( 6.14x) put_h264_qpel_16_mc23_8_c: 6725.6 ( 1.00x) put_h264_qpel_16_mc23_8_rvv_i32: 1183.6 ( 5.68x) put_h264_qpel_16_mc30_8_c: 2308.6 ( 1.00x) put_h264_qpel_16_mc30_8_rvv_i32: 443.9 ( 5.20x) put_h264_qpel_16_mc31_8_c: 4548.4 ( 1.00x) put_h264_qpel_16_mc31_8_rvv_i32: 704.4 ( 6.46x) put_h264_qpel_16_mc32_8_c: 7412.9 ( 1.00x) put_h264_qpel_16_mc32_8_rvv_i32: 1037.8 ( 7.14x) put_h264_qpel_16_mc33_8_c: 4558.6 ( 1.00x) put_h264_qpel_16_mc33_8_rvv_i32: 694.1 ( 6.57x) Signed-off-by: Niklas Haas Signed-off-by: J. Dekker --- libavcodec/h264qpel.c | 2 + libavcodec/h264qpel.h | 1 + libavcodec/riscv/Makefile | 2 + libavcodec/riscv/h264qpel_init.c | 113 +++++++ libavcodec/riscv/h264qpel_rvv.S | 554 +++++++++++++++++++++++++++++++ 5 files changed, 672 insertions(+) create mode 100644 libavcodec/riscv/h264qpel_init.c create mode 100644 libavcodec/riscv/h264qpel_rvv.S diff --git a/libavcodec/h264qpel.c b/libavcodec/h264qpel.c index 65fef03304..faca1e8953 100644 --- a/libavcodec/h264qpel.c +++ b/libavcodec/h264qpel.c @@ -102,6 +102,8 @@ av_cold void ff_h264qpel_init(H264QpelContext *c, int bit_depth) ff_h264qpel_init_arm(c, bit_depth); #elif ARCH_PPC ff_h264qpel_init_ppc(c, bit_depth); +#elif ARCH_RISCV + ff_h264qpel_init_riscv(c, bit_depth); #elif ARCH_X86 ff_h264qpel_init_x86(c, bit_depth); #elif ARCH_MIPS diff --git a/libavcodec/h264qpel.h b/libavcodec/h264qpel.h index 0259e8de23..24baf826f9 100644 --- a/libavcodec/h264qpel.h +++ b/libavcodec/h264qpel.h @@ -34,6 +34,7 @@ void ff_h264qpel_init(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_aarch64(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_arm(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_ppc(H264QpelContext *c, int bit_depth); +void ff_h264qpel_init_riscv(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_x86(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_mips(H264QpelContext *c, int bit_depth); void ff_h264qpel_init_loongarch(H264QpelContext *c, int bit_depth); diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index b3a6b588c9..d4276521f3 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -33,6 +33,8 @@ RVV-OBJS-$(CONFIG_H264CHROMA) += riscv/h264_mc_chroma.o OBJS-$(CONFIG_H264DSP) += riscv/h264dsp_init.o RVV-OBJS-$(CONFIG_H264DSP) += riscv/h264addpx_rvv.o riscv/h264dsp_rvv.o \ riscv/h264idct_rvv.o +OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_init.o +RVV-OBJS-$(CONFIG_H264QPEL) += riscv/h264qpel_rvv.o OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_init.o RVV-OBJS-$(CONFIG_HUFFYUV_DECODER) += riscv/huffyuvdsp_rvv.o OBJS-$(CONFIG_IDCTDSP) += riscv/idctdsp_init.o diff --git a/libavcodec/riscv/h264qpel_init.c b/libavcodec/riscv/h264qpel_init.c new file mode 100644 index 0000000000..69a1345447 --- /dev/null +++ b/libavcodec/riscv/h264qpel_init.c @@ -0,0 +1,113 @@ +/* + * RISC-V optimised DSP functions + * Copyright (c) 2024 Niklas Haas + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include + +#include "config.h" +#include "libavutil/attributes.h" +#include "libavutil/riscv/cpu.h" +#include "libavcodec/h264qpel.h" + +#define DECL_QPEL_OPS(OP, SIZE, EXT) \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc00_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc10_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc20_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc30_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc01_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc11_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc21_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc31_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc02_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc12_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc22_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc32_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc03_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc13_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc23_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ +void ff_ ## OP ## _h264_qpel ## SIZE ## _mc33_ ## EXT(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); + +DECL_QPEL_OPS(put, 16, rvv256) +DECL_QPEL_OPS(put, 8, rvv256) +DECL_QPEL_OPS(put, 4, rvv256) + +DECL_QPEL_OPS(avg, 16, rvv256) +DECL_QPEL_OPS(avg, 8, rvv256) +DECL_QPEL_OPS(avg, 4, rvv256) + +DECL_QPEL_OPS(put, 16, rvv) +DECL_QPEL_OPS(put, 8, rvv) +DECL_QPEL_OPS(put, 4, rvv) + +DECL_QPEL_OPS(avg, 16, rvv) +DECL_QPEL_OPS(avg, 8, rvv) +DECL_QPEL_OPS(avg, 4, rvv) + +#define SET_QPEL_FNS(OP, IDX, SIZE, EXT) \ +do { \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 0] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc00_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 1] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc10_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 2] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc20_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 3] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc30_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 4] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc01_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 5] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc11_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 6] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc21_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 7] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc31_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 8] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc02_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][ 9] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc12_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][10] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc22_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][11] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc32_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][12] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc03_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][13] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc13_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][14] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc23_ ## EXT; \ + c->OP ## _h264_qpel_pixels_tab[IDX][15] = ff_ ## OP ## _h264_qpel ## SIZE ## _mc33_ ## EXT; \ +} while (0) + +av_cold void ff_h264qpel_init_riscv(H264QpelContext *c, int bit_depth) +{ +#if HAVE_RVV + int flags = av_get_cpu_flags(); + if (flags & AV_CPU_FLAG_RVV_I32) { + const int vlen = 8 * ff_get_rv_vlenb(); + + switch (bit_depth) { + case 8: + if (vlen >= 256) { + SET_QPEL_FNS(put, 0, 16, rvv256); + SET_QPEL_FNS(put, 1, 8, rvv256); + SET_QPEL_FNS(put, 2, 4, rvv256); + + SET_QPEL_FNS(avg, 0, 16, rvv256); + SET_QPEL_FNS(avg, 1, 8, rvv256); + SET_QPEL_FNS(avg, 2, 4, rvv256); + } else if (vlen >= 128) { + SET_QPEL_FNS(put, 0, 16, rvv); + SET_QPEL_FNS(put, 1, 8, rvv); + SET_QPEL_FNS(put, 2, 4, rvv); + + SET_QPEL_FNS(avg, 0, 16, rvv); + SET_QPEL_FNS(avg, 1, 8, rvv); + SET_QPEL_FNS(avg, 2, 4, rvv); + } + break; + } + } +#endif +} diff --git a/libavcodec/riscv/h264qpel_rvv.S b/libavcodec/riscv/h264qpel_rvv.S new file mode 100644 index 0000000000..7713372f23 --- /dev/null +++ b/libavcodec/riscv/h264qpel_rvv.S @@ -0,0 +1,554 @@ +/* + * SPDX-License-Identifier: BSD-2-Clause + * + * Copyright (c) 2024 Niklas Haas + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#include "libavutil/riscv/asm.S" + +.macro vnclipsu.wi shifti, lmul, lmul2, vregs:vararg + vsetvli zero, zero, e16, \lmul2, ta, ma + .irp x, \vregs + vmax.vx \x, \x, zero + .endr + vsetvli zero, zero, e8, \lmul, ta, ma + .irp x, \vregs + vnclipu.wi \x, \x, \shifti + .endr +.endm + +.macro lowpass_init lmul, sizei, size, w0, w1, backup + vsetivli zero, \sizei, e8, \lmul, ta, ma + csrwi vxrm, 0 + li \size, \sizei + .ifnb \w0 + li \w0, 20 + li \w1, -5 + .endif +.endm + + /* output is unclipped; clobbers v26-v31 plus \tmp and \tmp2 */ +.macro lowpass_h vdst, src, w0, w1, tmp=t3, tmp2=t4 + addi \tmp, \src, 3 + lbu \tmp2, 2(\src) + vle8.v v31, (\tmp) + lbu \tmp, 1(\src) + vslide1up.vx v30, v31, \tmp2 + lbu \tmp2, 0(\src) + vslide1up.vx v29, v30, \tmp + lbu \tmp, -1(\src) + vslide1up.vx v28, v29, \tmp2 + lbu \tmp2, -2(\src) + vslide1up.vx v27, v28, \tmp + vslide1up.vx v26, v27, \tmp2 + vwaddu.vv \vdst, v26, v31 + vwmaccu.vx \vdst, \w0, v28 + vwmaccu.vx \vdst, \w0, v29 + vwmaccsu.vx \vdst, \w1, v27 + vwmaccsu.vx \vdst, \w1, v30 +.endm + + /* output is unclipped */ +.macro lowpass_v w0, w1, vdst, vsrc0, vsrc1, vsrc2, vsrc3, vsrc4, vsrc5, signed=0 + .if \signed + vwadd.vv \vdst, \vsrc0, \vsrc5 + vwmacc.vx \vdst, \w0, \vsrc2 + vwmacc.vx \vdst, \w0, \vsrc3 + vwmacc.vx \vdst, \w1, \vsrc1 + vwmacc.vx \vdst, \w1, \vsrc4 + .else + vwaddu.vv \vdst, \vsrc0, \vsrc5 + vwmaccu.vx \vdst, \w0, \vsrc2 + vwmaccu.vx \vdst, \w0, \vsrc3 + vwmaccsu.vx \vdst, \w1, \vsrc1 + vwmaccsu.vx \vdst, \w1, \vsrc4 + .endif +.endm + +.macro qpel_mc00 op, dst, src, stride, size +func ff_\op\()_h264_qpel_pixels, zve32x +1: + add t0, \stride, \src + add t1, \stride, t0 + add t2, \stride, t1 + vle8.v v0, (\src) + vle8.v v1, (t0) + vle8.v v2, (t1) + vle8.v v3, (t2) + addi \size, \size, -4 + add \src, \stride, t2 + add t0, \stride, \dst + add t1, \stride, t0 + add t2, \stride, t1 + .ifc \op, avg + vle8.v v4, (\dst) + vle8.v v5, (t0) + vle8.v v6, (t1) + vle8.v v7, (t2) + vaaddu.vv v0, v0, v4 + vaaddu.vv v1, v1, v5 + vaaddu.vv v2, v2, v6 + vaaddu.vv v3, v3, v7 + .endif + vse8.v v0, (\dst) + vse8.v v1, (t0) + vse8.v v2, (t1) + vse8.v v3, (t2) + add \dst, \stride, t2 + bnez \size, 1b + ret +endfunc +.endm + + qpel_mc00 put, a0, a1, a2, a4 + qpel_mc00 avg, a0, a1, a2, a4 + +.macro qpel_lowpass op, ext, lmul, lmul2, dst, src, dst_stride, src_stride, size, w0, w1, src2, src2_stride +func ff_\op\()_h264_qpel_h_lowpass_\lmul\ext, zve32x +1: + add t0, \src_stride, \src + add t1, \src_stride, t0 + add t2, \src_stride, t1 + lowpass_h v0, \src, \w0, \w1 + lowpass_h v2, t0, \w0, \w1 + lowpass_h v4, t1, \w0, \w1 + lowpass_h v6, t2, \w0, \w1 + add \src, \src_stride, t2 + addi \size, \size, -4 + vnclipsu.wi 5, \lmul, \lmul2, v0, v2, v4, v6 + .ifnb \src2 + add t0, \src2_stride, \src2 + add t1, \src2_stride, t0 + add t2, \src2_stride, t1 + vle8.v v8, (\src2) + vle8.v v10, (t0) + vle8.v v12, (t1) + vle8.v v14, (t2) + add \src2, \dst_stride, t2 + vaaddu.vv v0, v0, v8 + vaaddu.vv v2, v2, v10 + vaaddu.vv v4, v4, v12 + vaaddu.vv v6, v6, v14 + .endif + add t0, \dst_stride, \dst + add t1, \dst_stride, t0 + add t2, \dst_stride, t1 + .ifc \op, avg + vle8.v v1, (\dst) + vle8.v v3, (t0) + vle8.v v5, (t1) + vle8.v v7, (t2) + vaaddu.vv v0, v0, v1 + vaaddu.vv v2, v2, v3 + vaaddu.vv v4, v4, v5 + vaaddu.vv v6, v6, v7 + .endif + vse8.v v0, (\dst) + vse8.v v2, (t0) + vse8.v v4, (t1) + vse8.v v6, (t2) + add \dst, \dst_stride, t2 + bnez \size, 1b + ret +endfunc + +func ff_\op\()_h264_qpel_v_lowpass_\lmul\ext, zve32x + sub t0, \src, \src_stride + sub t1, t0, \src_stride + vle8.v v2, (\src) + vle8.v v1, (t0) + vle8.v v0, (t1) + add t0, \src, \src_stride + add t1, t0, \src_stride + add \src, t1, \src_stride + vle8.v v3, (t0) + vle8.v v4, (t1) +1: + add t0, \src_stride, \src + add t1, \src_stride, t0 + add t2, \src_stride, t1 + vle8.v v5, (\src) + vle8.v v6, (t0) + vle8.v v7, (t1) + vle8.v v8, (t2) + add \src, \src_stride, t2 + lowpass_v \w0, \w1, v24, v0, v1, v2, v3, v4, v5 + lowpass_v \w0, \w1, v26, v1, v2, v3, v4, v5, v6 + lowpass_v \w0, \w1, v28, v2, v3, v4, v5, v6, v7 + lowpass_v \w0, \w1, v30, v3, v4, v5, v6, v7, v8 + addi \size, \size, -4 + vnclipsu.wi 5, \lmul, \lmul2, v24, v26, v28, v30 + .ifnb \src2 + add t0, \src2_stride, \src2 + add t1, \src2_stride, t0 + add t2, \src2_stride, t1 + vle8.v v9, (\src2) + vle8.v v10, (t0) + vle8.v v11, (t1) + vle8.v v12, (t2) + add \src2, \src2_stride, t2 + vaaddu.vv v24, v24, v9 + vaaddu.vv v26, v26, v10 + vaaddu.vv v28, v28, v11 + vaaddu.vv v30, v30, v12 + .endif + add t0, \dst_stride, \dst + add t1, \dst_stride, t0 + add t2, \dst_stride, t1 + .ifc \op, avg + vle8.v v9, (\dst) + vle8.v v10, (t0) + vle8.v v11, (t1) + vle8.v v12, (t2) + vaaddu.vv v24, v24, v9 + vaaddu.vv v26, v26, v10 + vaaddu.vv v28, v28, v11 + vaaddu.vv v30, v30, v12 + .endif + vse8.v v24, (\dst) + vse8.v v26, (t0) + vse8.v v28, (t1) + vse8.v v30, (t2) + add \dst, \dst_stride, t2 + vmv.v.v v0, v4 + vmv.v.v v1, v5 + vmv.v.v v2, v6 + vmv.v.v v3, v7 + vmv.v.v v4, v8 + bnez \size, 1b + ret +endfunc + +func ff_\op\()_h264_qpel_hv_lowpass_\lmul\ext, zve32x + sub t0, \src, \src_stride + sub t1, t0, \src_stride + lowpass_h v4, \src, \w0, \w1 + lowpass_h v2, t0, \w0, \w1 + lowpass_h v0, t1, \w0, \w1 + add t0, \src, \src_stride + add t1, t0, \src_stride + add \src, t1, \src_stride + lowpass_h v6, t0, \w0, \w1 + lowpass_h v8, t1, \w0, \w1 +1: + add t0, \src_stride, \src + add t1, \src_stride, t0 + add t2, \src_stride, t1 + lowpass_h v10, \src, \w0, \w1 + lowpass_h v12, t0, \w0, \w1 + lowpass_h v14, t1, \w0, \w1 + lowpass_h v16, t2, \w0, \w1 + vsetvli zero, zero, e16, \lmul2, ta, ma + addi \size, \size, -4 + lowpass_v \w0, \w1, v20, v0, v2, v4, v6, v8, v10, signed=1 + lowpass_v \w0, \w1, v24, v2, v4, v6, v8, v10, v12, signed=1 + lowpass_v \w0, \w1, v28, v4, v6, v8, v10, v12, v14, signed=1 + vnclip.wi v0, v20, 10 + lowpass_v \w0, \w1, v20, v6, v8, v10, v12, v14, v16, signed=1 + vnclip.wi v2, v24, 10 + vnclip.wi v4, v28, 10 + vnclip.wi v6, v20, 10 + vmax.vx v18, v0, zero + vmax.vx v20, v2, zero + vmax.vx v22, v4, zero + vmax.vx v24, v6, zero + vmv.v.v v0, v8 + vmv.v.v v2, v10 + vmv.v.v v4, v12 + vmv.v.v v6, v14 + vmv.v.v v8, v16 + add \src, \src_stride, t2 + vsetvli zero, zero, e8, \lmul, ta, ma + vnclipu.wi v18, v18, 0 + vnclipu.wi v20, v20, 0 + vnclipu.wi v22, v22, 0 + vnclipu.wi v24, v24, 0 + .ifnb \src2 + add t0, \src2_stride, \src2 + add t1, \src2_stride, t0 + add t2, \src2_stride, t1 + vle8.v v26, (\src2) + vle8.v v27, (t0) + vle8.v v28, (t1) + vle8.v v29, (t2) + add \src2, \src2_stride, t2 + vaaddu.vv v18, v18, v26 + vaaddu.vv v20, v20, v27 + vaaddu.vv v22, v22, v28 + vaaddu.vv v24, v24, v29 + .endif + add t0, \dst_stride, \dst + add t1, \dst_stride, t0 + add t2, \dst_stride, t1 + .ifc \op, avg + vle8.v v26, (\dst) + vle8.v v27, (t0) + vle8.v v28, (t1) + vle8.v v29, (t2) + vaaddu.vv v18, v18, v26 + vaaddu.vv v20, v20, v27 + vaaddu.vv v22, v22, v28 + vaaddu.vv v24, v24, v29 + .endif + vse8.v v18, (\dst) + vse8.v v20, (t0) + vse8.v v22, (t1) + vse8.v v24, (t2) + add \dst, \dst_stride, t2 + bnez \size, 1b + ret +endfunc +.endm + +/* Note: We could possibly specialize for the width 8 / width 4 cases by + loading 32 bit integers, but this makes the convolutions more complicated + to implement, so it's not necessarily any faster. */ + +.macro h264_qpel lmul, lmul2 + qpel_lowpass put, , \lmul, \lmul2, a0, a1, a2, a3, a4, t5, t6 + qpel_lowpass put, _l2, \lmul, \lmul2, a0, a1, a2, a3, a4, t5, t6, a5, a6 + qpel_lowpass avg, , \lmul, \lmul2, a0, a1, a2, a3, a4, t5, t6 + qpel_lowpass avg, _l2, \lmul, \lmul2, a0, a1, a2, a3, a4, t5, t6, a5, a6 +.endm + + h264_qpel m1, m2 + h264_qpel mf2, m1 + h264_qpel mf4, mf2 + h264_qpel mf8, mf4 + +.macro ff_h264_qpel_fns op, lmul, sizei, ext=rvv, dst, src, dst_stride, src_stride, size, w0, w1, src2, src2_stride, tmp +func ff_\op\()_h264_qpel\sizei\()_mc00_\ext, zve32x + lowpass_init \lmul, \sizei, \size, + j ff_\op\()_h264_qpel_pixels +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc10_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + mv \src2, \src + mv \src2_stride, \src_stride + j ff_\op\()_h264_qpel_h_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc20_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + j ff_\op\()_h264_qpel_h_lowpass_\lmul\() +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc30_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + addi \src2, \src, 1 + mv \src2_stride, \src_stride + j ff_\op\()_h264_qpel_h_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc01_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + mv \src2, \src + mv \src2_stride, \src_stride + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc02_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + j ff_\op\()_h264_qpel_v_lowpass_\lmul +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc03_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + add \src2, \src, \src_stride + mv \src2_stride, \src_stride + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc11_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc31_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + addi \src, \src, 1 + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc13_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + add \src, \src, \src_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc33_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + add \src, \src, \src_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + addi \src, \src, 1 + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_v_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc22_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + mv \src_stride, \dst_stride + j ff_\op\()_h264_qpel_hv_lowpass_\lmul +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc21_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_hv_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc23_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + add \src, \src, \src_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_h_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_hv_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc12_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + mv \src_stride, \dst_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_v_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_hv_lowpass_\lmul\()_l2 +endfunc + +func ff_\op\()_h264_qpel\sizei\()_mc32_\ext, zve32x + lowpass_init \lmul, \sizei, \size, \w0, \w1 + push \dst, \src + mv \tmp, ra + addi \src, \src, 1 + mv \src_stride, \dst_stride + addi \dst, sp, -(\sizei * \sizei) + li \dst_stride, \sizei + call ff_put_h264_qpel_v_lowpass_\lmul + addi \src2, sp, -(\sizei * \sizei) + mv \src2_stride, \dst_stride + pop \dst, \src + mv \dst_stride, \src_stride + li \size, \sizei + mv ra, \tmp + j ff_\op\()_h264_qpel_hv_lowpass_\lmul\()_l2 +endfunc +.endm + + ff_h264_qpel_fns put, mf2, 16, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns put, mf4, 8, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns put, mf8, 4, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + + ff_h264_qpel_fns avg, mf2, 16, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns avg, mf4, 8, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns avg, mf8, 4, rvv256, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + + ff_h264_qpel_fns put, m1, 16, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns put, mf2, 8, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns put, mf4, 4, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + + ff_h264_qpel_fns avg, m1, 16, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns avg, mf2, 8, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7 + ff_h264_qpel_fns avg, mf4, 4, rvv, a0, a1, a2, a3, a4, t5, t6, a5, a6, a7