From patchwork Thu Nov 9 18:34:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44601 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4fa4:b0:181:818d:5e7f with SMTP id gh36csp682977pzb; Thu, 9 Nov 2023 10:35:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IHJ/pIo0fn1uRtde51C4y0ixQmpwnpk4OTOlz3l0XLzfH9LWVs4pVQXKXXp8vXq9gnAHCyu X-Received: by 2002:a17:907:720b:b0:9dd:f00d:c4e5 with SMTP id dr11-20020a170907720b00b009ddf00dc4e5mr5032992ejc.53.1699554904815; Thu, 09 Nov 2023 10:35:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699554904; cv=none; d=google.com; s=arc-20160816; b=FkpuznSzLOthwHhEwDLu6VjXh858H9CWJOXEsp9dO88gBl8uKTzoO9SmSjAR+i0wpm mMM/IR+ucltNjuINg/4lnQun17EANaNR2u+AjvGar9JpwnfwV2+T3zqyoN0gN+uKeHCU N1coN3JsZFI6UMhPf65JUIs+huUR89ry5Z4ZWcdOR1Hetd/KdlEADcIj5mY8+PMiYjK3 I/wlqMqq5vutrwL4zX1jK9PPVtnLD9lYW8gdLWWiNb37lIVNTOnrMte5nZTyWDeOPN3U PfUGbun4YAjk95iZjTJh5x3IwlgU7I3uwZmd3+LlsSjRM7ydP6vnXxB15e43PKbqzPmc 4EtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:message-id:date:to:from :delivered-to; bh=KLdvxfMLjKLqk+tRRyUq9d6ZU9EV1m8CTHy69xJWb9w=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=K+3UtEdUWEZuC7/wjcsThmM8krmu59mE/kUHAsjIEjEEM7kTu6aN7qx35BOVXQPwxs QCStPm6iHXHO3DLmZy3n7gGYRy14lqagqLg/ocaz6Tf0qg/asAQduMnd3zhi7jhIMEfs DhZHDOk3lpSsq2RR+0NATv7Vqp/Yu4dJnORX6R5byRIlkD0TDe5nIK+reUH11dszLrX5 WKGXPWvbztCEEsXmYMyIcs/YQs6Js65S7lcXI4H8XA82FY9uo9LXXW8FpVMzh7UoDbvx Ptrzo9S3MJvScszOz8xZjQLnHaC0jdz6lrfGs60fCG0O0iLY1crqGamvdB/6lIJlwooF m4RQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id v23-20020a17090651d700b0098759716e36si3746020ejk.217.2023.11.09.10.35.04; Thu, 09 Nov 2023 10:35:04 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 36F8C68CBB8; Thu, 9 Nov 2023 20:35:02 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6F7A868CA0B for ; Thu, 9 Nov 2023 20:34:54 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id F001CC00C1 for ; Thu, 9 Nov 2023 20:34:53 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 9 Nov 2023 20:34:52 +0200 Message-ID: <20231109183453.12390-1-remi@remlab.net> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: ONDm6ghegBee This saves three scratch registers and three instructions per line. The performance gains are mostly negligible. The main point is to free up registers for further rework. --- libswscale/riscv/rgb2rgb_rvv.S | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/libswscale/riscv/rgb2rgb_rvv.S b/libswscale/riscv/rgb2rgb_rvv.S index 671089c842..172f5918dc 100644 --- a/libswscale/riscv/rgb2rgb_rvv.S +++ b/libswscale/riscv/rgb2rgb_rvv.S @@ -127,31 +127,30 @@ func ff_deinterleave_bytes_rvv, zve32x endfunc .macro yuy2_to_i422p y_shift - addi a4, a4, 1 + slli t4, a4, 1 // pixel width -> (source) byte width lw t6, (sp) + sub a6, a6, a4 srai a4, a4, 1 // pixel width -> chroma width + sub a7, a7, a4 + sub t6, t6, t4 1: mv t4, a4 - mv t3, a3 - mv t0, a0 - mv t1, a1 - mv t2, a2 addi a5, a5, -1 2: vsetvli t5, t4, e8, m2, ta, ma - vlseg2e16.v v16, (t3) + vlseg2e16.v v16, (a3) sub t4, t4, t5 vnsrl.wi v24, v16, \y_shift // Y0 - sh2add t3, t5, t3 + sh2add a3, t5, a3 vnsrl.wi v26, v20, \y_shift // Y1 vnsrl.wi v28, v16, 8 - \y_shift // U vnsrl.wi v30, v20, 8 - \y_shift // V - vsseg2e8.v v24, (t0) - sh1add t0, t5, t0 - vse8.v v28, (t1) - add t1, t5, t1 - vse8.v v30, (t2) - add t2, t5, t2 + vsseg2e8.v v24, (a0) + sh1add a0, t5, a0 + vse8.v v28, (a1) + add a1, t5, a1 + vse8.v v30, (a2) + add a2, t5, a2 bnez t4, 2b add a3, a3, t6 From patchwork Thu Nov 9 18:34:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 44602 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:4fa4:b0:181:818d:5e7f with SMTP id gh36csp683069pzb; Thu, 9 Nov 2023 10:35:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IHPccP7LMuyswshiHJN2lDzQsOdbNhYgItnhMpHRbzuIvLFPkBpqeWaZ3QCczkQ2T0tp5IU X-Received: by 2002:a17:906:8910:b0:9e4:121c:c292 with SMTP id fr16-20020a170906891000b009e4121cc292mr2357888ejc.77.1699554913073; Thu, 09 Nov 2023 10:35:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699554913; cv=none; d=google.com; s=arc-20160816; b=Z3tpMuAlYn1mmpBTetyyDopefZu55vV/GV+Vczz8kRYIMCmnrvna5xkqJlEeN0pB+W j1o8ekrmS+RoQmhhMTxCLNOwxK3V3zHAiYoH8SW710+psAH+i6s8BknYom0mNxHX9YkU WlGNhQXCDG/QhKV2AiFuGMSPz1egKSZMnoK3HBUHkPRF6rIAf9LICrFE8+jXxsJrqo4o dcGRcbXnzxXXQuZWrEuDUK5HUsDnG3TPjd5vMmZGo/7T5WjgY6ucHVtqHHucOpBQKTl5 T7uLRIapg6i6xlOZRx9AJzFygmXKlQ47nZ5Rna1VQP8xCuaABqvQP14xvBMueTnbmYcl a3pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=NN/SYvcXwmLo1O2OyGM1FSao+g7B8zPjKrTiexWNkP4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Hy0qnAIl0qurk2CohN66lxjMbzjf2j7sje9tnw1U62vX70iKok7BU8vpH71hlyqiYg wohmrQWUbHPa22dpZxGPnDuhE1HJ03DDn/v6ths4Aqnnrre1CdUbGiQzEIpRsoT8rjsU uXV4USywvOT+FyEYMptAtjY8XT5+j5zmcVoIK+plrxcKdt+W5EpvfXQJeypIWv1k4/GZ CfsZPlt0HnzwHmTfZs9anjhM9MEhOI9QHMZZXE7Alt6NgSuUe7LPx9IKFehKTYmDZHMh dgS1RpWI1Ajai+ZQbzZxKutEbvLTB/tqruSlKBL/sApRTOBaFVSxHYDRrMthRbDfEQt/ cY5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id s14-20020a170906220e00b009d441527214si3626330ejs.1045.2023.11.09.10.35.12; Thu, 09 Nov 2023 10:35:13 -0800 (PST) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2B94368CBC8; Thu, 9 Nov 2023 20:35:03 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9017768CAA9 for ; Thu, 9 Nov 2023 20:34:54 +0200 (EET) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 2E9D4C00C2 for ; Thu, 9 Nov 2023 20:34:54 +0200 (EET) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= To: ffmpeg-devel@ffmpeg.org Date: Thu, 9 Nov 2023 20:34:53 +0200 Message-ID: <20231109183453.12390-2-remi@remlab.net> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231109183453.12390-1-remi@remlab.net> References: <20231109183453.12390-1-remi@remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: MDZDjLm6jpYv In my personal opinion, we should not need to support unaligned YUY2 pixel maps. They should always be aligned to at least 32 bits, and the current code assumes just 16 bits. However checkasm does test for unaligned input bitmaps. QEMU accepts it, but real hardware dose not. In this particular case, we can at the same time improve performance and handle unaligned inputs, so do just that. uyvytoyuv422_c: 104060.0 uyvytoyuv422_rvv_i32: 25284.0 (before) uyvytoyuv422_rvv_i32: 20148.2 (after) --- libswscale/riscv/rgb2rgb_rvv.S | 45 +++++++++++++++++----------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/libswscale/riscv/rgb2rgb_rvv.S b/libswscale/riscv/rgb2rgb_rvv.S index 172f5918dc..716948dc82 100644 --- a/libswscale/riscv/rgb2rgb_rvv.S +++ b/libswscale/riscv/rgb2rgb_rvv.S @@ -126,32 +126,33 @@ func ff_deinterleave_bytes_rvv, zve32x ret endfunc -.macro yuy2_to_i422p y_shift - slli t4, a4, 1 // pixel width -> (source) byte width +.macro yuy2_to_i422p luma, chroma + srai t4, a4, 1 // pixel width -> chroma width lw t6, (sp) + slli t5, a4, 1 // pixel width -> (source) byte width sub a6, a6, a4 - srai a4, a4, 1 // pixel width -> chroma width - sub a7, a7, a4 - sub t6, t6, t4 + sub a7, a7, t4 + sub t6, t6, t5 1: mv t4, a4 addi a5, a5, -1 2: - vsetvli t5, t4, e8, m2, ta, ma - vlseg2e16.v v16, (a3) - sub t4, t4, t5 - vnsrl.wi v24, v16, \y_shift // Y0 - sh2add a3, t5, a3 - vnsrl.wi v26, v20, \y_shift // Y1 - vnsrl.wi v28, v16, 8 - \y_shift // U - vnsrl.wi v30, v20, 8 - \y_shift // V - vsseg2e8.v v24, (a0) - sh1add a0, t5, a0 - vse8.v v28, (a1) - add a1, t5, a1 - vse8.v v30, (a2) - add a2, t5, a2 - bnez t4, 2b + vsetvli t5, t4, e8, m4, ta, ma + vlseg2e8.v v16, (a3) + srli t1, t5, 1 + vsetvli zero, t1, e8, m2, ta, ma + vnsrl.wi v24, \chroma, 0 // U + sub t4, t4, t5 + vnsrl.wi v28, \chroma, 8 // V + sh1add a3, t5, a3 + vse8.v v24, (a1) + add a1, t1, a1 + vse8.v v28, (a2) + add a2, t1, a2 + vsetvli zero, t5, e8, m4, ta, ma + vse8.v \luma, (a0) + add a0, t5, a0 + bnez t4, 2b add a3, a3, t6 add a0, a0, a6 @@ -163,9 +164,9 @@ endfunc .endm func ff_uyvytoyuv422_rvv, zve32x - yuy2_to_i422p 8 + yuy2_to_i422p v20, v16 endfunc func ff_yuyvtoyuv422_rvv, zve32x - yuy2_to_i422p 0 + yuy2_to_i422p v16, v20 endfunc