From patchwork Wed Sep 28 15:30:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= X-Patchwork-Id: 38441 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3b1c:b0:96:9ee8:5cfd with SMTP id c28csp310090pzh; Wed, 28 Sep 2022 08:30:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM47PngOo0XYYGT4mc8QEa0PatthXTWTf7+dJVHsSnHr+8Jlfkv15pPELUhIexfSPaobAp3d X-Received: by 2002:a17:907:c10:b0:782:386f:f558 with SMTP id ga16-20020a1709070c1000b00782386ff558mr27260337ejc.739.1664379047208; Wed, 28 Sep 2022 08:30:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664379047; cv=none; d=google.com; s=arc-20160816; b=EmlQ2SJjL/VI20F4OFx/sVRYQcaSRvxIxmKJecYz1eW1SlTHjWzEqzcYVRS+3Pqlfz 5LA4FTUhvv2bVU5kS1YO+avVdrNUA4WMw89I5C9dguIKz99F6F8lNz+XtEPbQaF16TW+ CJjYruOqkBB/ayfst+/uIaDBzksHWVBrm0p0RPID4bxCIFU2f0WqcSXWjwnJpe3WKGCb 22aB1N8ahvC56C8nqGFxHqODKsvhpukjbQIL8oUKADLBTJbCP55pYhrRf5RAgS6U8Xe8 TdrY8wnxgEU1q/fbkvD8DzwECsWZXJXBrhvzlhZmMDX1Ua88/IGIHRY/fRec1Pi9v/cW ggRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=CEL85XYkiTXBKDtotHO3DHvknwRgfJltaOUUHDR8M04=; b=hdgeTZfwcozlKoMMKyYGBv3xkKxbAzfnp9fKXB/mWBiByDoBL9U5IkUAaXdgBOUMEB 5rk362VsOxJb6TCvs9gmMTM8vktiLYc/uyHxhXFxFkhi/ziNvMCVKuU4744q3bpxjf3x UTgOeokjdJzqgqMMISCBgor4xJeiKi+kji+kUc9yH2xpsz8b1c9sqwN3R41Hb00LpxdL 3EI873bnzwXg226L3XX2Emxu0llKOaDtcN12UsxSRaBnEpDFsSbjomVXyOAy6sHl2TPh xcLMsrVckJGHumP75+fUW3tvvBt3SRRuVqX64nYYcaUZRQoAF2Q9vp7H6xusGbmoimPU UYRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id ka7-20020a170907990700b0073d8e4e8c95si4634126ejc.923.2022.09.28.08.30.25; Wed, 28 Sep 2022 08:30:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5934B68BB63; Wed, 28 Sep 2022 18:30:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from ursule.remlab.net (vps-a2bccee9.vps.ovh.net [51.75.19.47]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6B8B268BB66 for ; Wed, 28 Sep 2022 18:30:02 +0300 (EEST) Received: from basile.remlab.net (localhost [IPv6:::1]) by ursule.remlab.net (Postfix) with ESMTP id 287F8C00AF for ; Wed, 28 Sep 2022 18:30:02 +0300 (EEST) From: remi@remlab.net To: ffmpeg-devel@ffmpeg.org Date: Wed, 28 Sep 2022 18:30:01 +0300 Message-Id: <20220928153001.30025-3-remi@remlab.net> X-Mailer: git-send-email 2.37.2 In-Reply-To: <12088142.O9o76ZdvQC@basile.remlab.net> References: <12088142.O9o76ZdvQC@basile.remlab.net> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/3] sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: HNeBr7kDmqWX From: RĂ©mi Denis-Courmont This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it. --- libswscale/riscv/rgb2rgb.c | 10 +++++++ libswscale/riscv/rgb2rgb_rvv.S | 53 ++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+) diff --git a/libswscale/riscv/rgb2rgb.c b/libswscale/riscv/rgb2rgb.c index 32c1546827..93bc6b6245 100644 --- a/libswscale/riscv/rgb2rgb.c +++ b/libswscale/riscv/rgb2rgb.c @@ -33,6 +33,12 @@ void ff_shuffle_bytes_3210_rvv(const uint8_t *src, uint8_t *dst, int src_len); void ff_interleave_bytes_rvv(const uint8_t *src1, const uint8_t *src2, uint8_t *dst, int width, int height, int s1stride, int s2stride, int dstride); +void ff_uyvytoyuv422_rvv(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, + const uint8_t *src, int width, int height, + int ystride, int uvstride, int src_stride); +void ff_yuyvtoyuv422_rvv(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, + const uint8_t *src, int width, int height, + int ystride, int uvstride, int src_stride); av_cold void rgb2rgb_init_riscv(void) { @@ -46,6 +52,10 @@ av_cold void rgb2rgb_init_riscv(void) shuffle_bytes_3012 = ff_shuffle_bytes_3012_rvv; shuffle_bytes_3210 = ff_shuffle_bytes_3210_rvv; interleaveBytes = ff_interleave_bytes_rvv; +# if (__riscv_xlen == 64) + uyvytoyuv422 = ff_uyvytoyuv422_rvv; + yuyvtoyuv422 = ff_yuyvtoyuv422_rvv; +# endif } #endif } diff --git a/libswscale/riscv/rgb2rgb_rvv.S b/libswscale/riscv/rgb2rgb_rvv.S index 7f8c2efd80..5626d906eb 100644 --- a/libswscale/riscv/rgb2rgb_rvv.S +++ b/libswscale/riscv/rgb2rgb_rvv.S @@ -102,3 +102,56 @@ func ff_interleave_bytes_rvv, zve32x ret endfunc + +#if (__riscv_xlen == 64) +.macro yuy2_to_i422p v_y0, v_y1, v_u, v_v + addi sp, sp, -16 + sd s0, (sp) + sd s1, 8(sp) + addi a4, a4, 1 + lw s0, 16(sp) + srai a4, a4, 1 // pixel width -> chroma width + li s1, 2 +1: + mv t4, a4 + mv t3, a3 + mv t0, a0 + addi t6, a0, 1 + mv t1, a1 + mv t2, a2 + addi a5, a5, -1 +2: + vsetvli t5, t4, e8, m1, ta, ma + sub t4, t4, t5 + vlseg4e8.v v8, (t3) + sh2add t3, t5, t3 + vsse8.v \v_y0, (t0), s1 + sh1add t0, t5, t0 + vsse8.v \v_y1, (t6), s1 + sh1add t6, t5, t6 + vse8.v \v_u, (t1) + add t1, t5, t1 + vse8.v \v_v, (t2) + add t2, t5, t2 + bnez t4, 2b + + add a3, a3, s0 + add a0, a0, a6 + add a1, a1, a7 + add a2, a2, a7 + bnez a5, 1b + + ld s1, 8(sp) + ld s0, (sp) + addi sp, sp, 16 + ret +.endm + +func ff_uyvytoyuv422_rvv, zve32x + yuy2_to_i422p v9, v11, v8, v10 +endfunc + +func ff_yuyvtoyuv422_rvv, zve32x + yuy2_to_i422p v8, v10, v9, v11 +endfunc +#endif