From patchwork Sat Mar 16 03:03:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyou Yin X-Patchwork-Id: 47117 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3942:b0:1a3:31a3:7958 with SMTP id ac2csp231516pzc; Fri, 15 Mar 2024 20:04:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVYTIZrev9EtsPhLAUQRgDAmB3DAT/Cu/dqlUdLOZ8aN3IUNJHrdFYZ1iv6jsnLrLugr2pH8I935ScCc6a5bswvmoLgQ65EEULvcA== X-Google-Smtp-Source: AGHT+IHbC295iE3wAiLv//BdYx6V4s9aa1Mw6s2Gg3WcwgRyKweReOpRXIFi0NPZm9IOHfswQ1+X X-Received: by 2002:ac2:5b10:0:b0:513:dd66:d5ed with SMTP id v16-20020ac25b10000000b00513dd66d5edmr1433998lfn.29.1710558268729; Fri, 15 Mar 2024 20:04:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1710558268; cv=none; d=google.com; s=arc-20160816; b=F2rnq7hGVlrjaR3VXWOxDMLkFwvSegxw5fGd1ryvFuOSpcZsV5xprtYrUOnf5fp/A6 0WIMRdvsGO3nThyDlCuLsEXOBW1TTuanXJUZym2POhtrhddzIlqDqmSSS0XeDAJSg2rJ XIdOucFxw49s6kNg5J1Z4wktRSnpa/QCoQ1zUzXdbqUoxQjoPWh3JPqLWF++60l89sGB YyBQdcFm44GYUiU8OAaxnftHl4D4YGefKhIPKvxNgo4VeibEpvLzs7j3z1lapoDlYeX2 2oJPjnfwNrfj6cB6/9H2U5X8N21z9us8DUb4D8ulHovqG3s61nIfX4FcMmjbuAucodRy GmXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=HMc3quD4GBqOjriAODCzX5ZddFeqGzM7tiBpBzjDWX4=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=GLye7Z8T8eNd0mBxQd4RyUOWDC+0HRISD7VUndzJksXMVQuMXyI53IY+P2ccoHtyaB 9yQPB4CsoxvbVxH05WOXwgqr5eGa2/Y1r1J2/rSxh+vgCJ8xbGorhdZoPRKqUahq6/PG 4NvMyLudyKEvulofNMqQre5murJNX+wgVDOvje0w53ZTTNK24ndKaaZ3M/LwIKj1ejmq IQ92nm9RWtRZLzfdyLlebJy+fzNKy0MswhIHKvfENMuf2az7KJ0ozwYEBvOvvvXqMffb IAtMsK82gopMW/RzDt97o16dOoan8tkhFcuYJOBty5HQyidcc9OYctIwlj0vXav1EZNQ qHig==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id r21-20020a1709067fd500b00a466d297343si2263588ejs.272.2024.03.15.20.04.28; Fri, 15 Mar 2024 20:04:28 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7F52168D16D; Sat, 16 Mar 2024 05:03:54 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7CA6268CC5A for ; Sat, 16 Mar 2024 05:03:46 +0200 (EET) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8AxDOsPDPVlR78ZAA--.52028S3; Sat, 16 Mar 2024 11:03:43 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxDBMNDPVln21bAA--.43757S3; Sat, 16 Mar 2024 11:03:41 +0800 (CST) From: Shiyou Yin To: ffmpeg-devel@ffmpeg.org Date: Sat, 16 Mar 2024 11:03:33 +0800 Message-Id: <20240316030333.31269-4-yinshiyou-hf@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240316030333.31269-1-yinshiyou-hf@loongson.cn> References: <20240316030333.31269-1-yinshiyou-hf@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8AxDBMNDPVln21bAA--.43757S3 X-CM-SenderInfo: p1lq2x5l1r3gtki6z05rqj20fqof0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfGFy5ur18Cw1rCr1xJrWxXwc_yoW8Aw1fZo WYyF1qqwn8K3y3GF9Ivw1rJ34fC3yjkF1YvasxJ3s5ta4SkF4YyrW2vw1YvFZrGws5ZFnr Z39Fqrn8GrZxGF1kl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYU7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1j6r1xM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4j6r4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUAVWUtwAv7VC2z280aVAF wI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUJVWUXwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7tx6UUUUU Subject: [FFmpeg-devel] [PATCH 3/3] swscale: [LA] Optimize swscale funcs in input.c X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: pvUs1T4xRdAh Optimized 7 funcs with LSX and LASX: 1. yuy2ToUV_c 2. yvy2ToUV_c 3. uyvyToUV_c 4. nv12ToUV_c 5. nv21ToUV_c 6. abgrToA_c 7. rgbaToA_c --- libswscale/loongarch/Makefile | 1 + libswscale/loongarch/input.S | 495 ++++++++++++++++++ libswscale/loongarch/input_lasx.c | 43 ++ libswscale/loongarch/input_lsx.c | 65 +++ libswscale/loongarch/swscale_init_loongarch.c | 20 +- libswscale/loongarch/swscale_loongarch.h | 46 ++ 6 files changed, 652 insertions(+), 18 deletions(-) create mode 100644 libswscale/loongarch/input_lsx.c diff --git a/libswscale/loongarch/Makefile b/libswscale/loongarch/Makefile index c35ba309a4..7ba11d492e 100644 --- a/libswscale/loongarch/Makefile +++ b/libswscale/loongarch/Makefile @@ -9,4 +9,5 @@ LSX-OBJS-$(CONFIG_SWSCALE) += loongarch/swscale.o \ loongarch/input.o \ loongarch/output.o \ loongarch/output_lsx.o \ + loongarch/input_lsx.o \ loongarch/yuv2rgb_lsx.o diff --git a/libswscale/loongarch/input.S b/libswscale/loongarch/input.S index d01f7384b1..717592b004 100644 --- a/libswscale/loongarch/input.S +++ b/libswscale/loongarch/input.S @@ -283,3 +283,498 @@ function planar_rgb_to_uv_lsx ld.d s3, sp, 16 addi.d sp, sp, 24 endfunc + +/* + * void yuy2ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + * const uint8_t *src2, int width, uint32_t *unused, void *opq) + */ +function yuy2ToUV_lsx + andi t0, a5, 7 + srli.d a5, a5, 3 + beqz a5, 2f +1: + vld vr0, a3, 1 + vld vr1, a3, 17 + addi.d a5, a5, -1 + addi.d a3, a3, 32 + vpickev.b vr2, vr1, vr0 + vpickev.b vr0, vr2, vr2 + vpickod.b vr1, vr2, vr2 + fst.d f0, a0, 0 + fst.d f1, a1, 0 + addi.d a0, a0, 8 + addi.d a1, a1, 8 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +function yuy2ToUV_lasx + andi t0, a5, 15 + srli.d a5, a5, 4 + beqz a5, 2f +1: + xvld xr0, a3, 1 + xvld xr1, a3, 33 + addi.d a5, a5, -1 + addi.d a3, a3, 64 + xvpickev.b xr2, xr1, xr0 + xvpermi.d xr2, xr2, 0xd8 + xvpickev.b xr0, xr2, xr2 + xvpermi.d xr0, xr0, 0xd8 + xvpickod.b xr1, xr2, xr2 + xvpermi.d xr1, xr1, 0xd8 + vst vr0, a0, 0 + vst vr1, a1, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +/* + * void yvy2ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + * const uint8_t *src2, int width, uint32_t *unused, void *opq) + */ +function yvy2ToUV_lsx + andi t0, a5, 7 + srli.d a5, a5, 3 + beqz a5, 2f +1: + vld vr0, a3, 1 + vld vr1, a3, 17 + addi.d a5, a5, -1 + addi.d a3, a3, 32 + vpickev.b vr2, vr1, vr0 + vpickev.b vr0, vr2, vr2 + vpickod.b vr1, vr2, vr2 + fst.d f0, a1, 0 + fst.d f1, a0, 0 + addi.d a0, a0, 8 + addi.d a1, a1, 8 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a1, 0 + st.b t2, a0, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +function yvy2ToUV_lasx + andi t0, a5, 15 + srli.d a5, a5, 4 + beqz a5, 2f +1: + xvld xr0, a3, 1 + xvld xr1, a3, 33 + addi.d a5, a5, -1 + addi.d a3, a3, 64 + xvpickev.b xr2, xr1, xr0 + xvpermi.d xr2, xr2, 0xd8 + xvpickev.b xr0, xr2, xr2 + xvpermi.d xr0, xr0, 0xd8 + xvpickod.b xr1, xr2, xr2 + xvpermi.d xr1, xr1, 0xd8 + vst vr0, a1, 0 + vst vr1, a0, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a1, 0 + st.b t2, a0, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +/* + * void uyvyToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + * const uint8_t *src2, int width, uint32_t *unused, void *opq) + */ +function uyvyToUV_lsx + andi t0, a5, 7 + srli.d a5, a5, 3 + beqz a5, 2f +1: + vld vr0, a3, 0 + vld vr1, a3, 16 + addi.d a5, a5, -1 + addi.d a3, a3, 32 + vpickev.b vr2, vr1, vr0 + vpickev.b vr0, vr2, vr2 + vpickod.b vr1, vr2, vr2 + fst.d f0, a0, 0 + fst.d f1, a1, 0 + addi.d a0, a0, 8 + addi.d a1, a1, 8 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +function uyvyToUV_lasx + andi t0, a5, 15 + srli.d a5, a5, 4 + beqz a5, 2f +1: + xvld xr0, a3, 0 + xvld xr1, a3, 32 + addi.d a5, a5, -1 + addi.d a3, a3, 64 + xvpickev.b xr2, xr1, xr0 + xvpermi.d xr2, xr2, 0xd8 + xvpickev.b xr0, xr2, xr2 + xvpermi.d xr0, xr0, 0xd8 + xvpickod.b xr1, xr2, xr2 + xvpermi.d xr1, xr1, 0xd8 + vst vr0, a0, 0 + vst vr1, a1, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 1 + ld.b t2, a3, 3 + addi.d a3, a3, 4 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +/* + * void nv12ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + * const uint8_t *src2, int width, uint32_t *unused, void *opq) + */ +function nv12ToUV_lsx + andi t0, a5, 15 + srli.d a5, a5, 4 + beqz a5, 2f +1: + vld vr0, a3, 0 + vld vr1, a3, 16 + addi.d a5, a5, -1 + addi.d a3, a3, 32 + vpickev.b vr2, vr1, vr0 + vpickod.b vr3, vr1, vr0 + vst vr2, a0, 0 + vst vr3, a1, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 0 + ld.b t2, a3, 1 + addi.d a3, a3, 2 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +function nv12ToUV_lasx + andi t0, a5, 31 + srli.d a5, a5, 5 + beqz a5, 2f +1: + xvld xr0, a3, 0 + xvld xr1, a3, 32 + addi.d a5, a5, -1 + addi.d a3, a3, 64 + xvpickev.b xr2, xr1, xr0 + xvpickod.b xr3, xr1, xr0 + xvpermi.d xr2, xr2, 0xd8 + xvpermi.d xr3, xr3, 0xd8 + xvst xr2, a0, 0 + xvst xr3, a1, 0 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 0 + ld.b t2, a3, 1 + addi.d a3, a3, 2 + addi.d t0, t0, -1 + st.b t1, a0, 0 + st.b t2, a1, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +/* + * void nv21ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + * const uint8_t *src2, int width, uint32_t *unused, void *opq) + */ +function nv21ToUV_lsx + andi t0, a5, 15 + srli.d a5, a5, 4 + beqz a5, 2f +1: + vld vr0, a3, 0 + vld vr1, a3, 16 + addi.d a5, a5, -1 + addi.d a3, a3, 32 + vpickev.b vr2, vr1, vr0 + vpickod.b vr3, vr1, vr0 + vst vr2, a1, 0 + vst vr3, a0, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 0 + ld.b t2, a3, 1 + addi.d a3, a3, 2 + addi.d t0, t0, -1 + st.b t1, a1, 0 + st.b t2, a0, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +function nv21ToUV_lasx + andi t0, a5, 31 + srli.d a5, a5, 5 + beqz a5, 2f +1: + xvld xr0, a3, 0 + xvld xr1, a3, 32 + addi.d a5, a5, -1 + addi.d a3, a3, 64 + xvpickev.b xr2, xr1, xr0 + xvpickod.b xr3, xr1, xr0 + xvpermi.d xr2, xr2, 0xd8 + xvpermi.d xr3, xr3, 0xd8 + xvst xr2, a1, 0 + xvst xr3, a0, 0 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + bnez a5, 1b +2: + beqz t0, 4f +3: + ld.b t1, a3, 0 + ld.b t2, a3, 1 + addi.d a3, a3, 2 + addi.d t0, t0, -1 + st.b t1, a1, 0 + st.b t2, a0, 0 + addi.d a0, a0, 1 + addi.d a1, a1, 1 + bnez t0, 3b +4: +endfunc + +/* + *void abgrToA_lsx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + * const uint8_t *unused2, int width, uint32_t *unused, void *opq) + */ +function abgrToA_lsx + andi t0, a4, 7 + srli.d a4, a4, 3 + vxor.v vr0, vr0, vr0 + beqz a4, 2f +1: + vld vr1, a1, 0 + vld vr2, a1, 16 + addi.d a4, a4, -1 + addi.d a1, a1, 32 + vpickev.b vr3, vr2, vr1 + vpackev.b vr3, vr0, vr3 + vslli.h vr1, vr3, 6 + vsrli.h vr2, vr3, 2 + vor.v vr3, vr2, vr1 + vst vr3, a0, 0 + addi.d a0, a0, 16 + bnez a4, 1b +2: + beqz t0, 4f +3: + ld.b t1, a1, 3 + addi.d t0, t0, -1 + addi.d a1, a1, 4 + andi t1, t1, 0xff + slli.w t2, t1, 6 + srli.w t3, t1, 2 + or t1, t2, t3 + st.h t1, a0, 0 + addi.d a0, a0, 2 + bnez t0, 3b +4: +endfunc + +function abgrToA_lasx + andi t0, a4, 15 + srli.d a4, a4, 4 + xvxor.v xr0, xr0, xr0 + beqz a4, 2f +1: + xvld xr1, a1, 0 + xvld xr2, a1, 32 + addi.d a4, a4, -1 + addi.d a1, a1, 64 + xvpickev.b xr3, xr2, xr1 + xvpermi.d xr3, xr3, 0xd8 + xvpackev.b xr3, xr0, xr3 + xvslli.h xr1, xr3, 6 + xvsrli.h xr2, xr3, 2 + xvor.v xr3, xr2, xr1 + xvst xr3, a0, 0 + addi.d a0, a0, 32 + bnez a4, 1b +2: + beqz t0, 4f +3: + ld.b t1, a1, 3 + addi.d t0, t0, -1 + addi.d a1, a1, 4 + andi t1, t1, 0xff + slli.w t2, t1, 6 + srli.w t3, t1, 2 + or t1, t2, t3 + st.h t1, a0, 0 + addi.d a0, a0, 2 + bnez t0, 3b +4: +endfunc + +/* + *void rgbaToA_lsx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + * const uint8_t *unused2, int width, uint32_t *unused, void *opq) + */ +function rgbaToA_lsx + andi t0, a4, 7 + srli.d a4, a4, 3 + vxor.v vr0, vr0, vr0 + beqz a4, 2f +1: + vld vr1, a1, 3 + vld vr2, a1, 19 + addi.d a4, a4, -1 + addi.d a1, a1, 32 + vpickev.b vr3, vr2, vr1 + vpackev.b vr3, vr0, vr3 + vslli.h vr1, vr3, 6 + vsrli.h vr2, vr3, 2 + vor.v vr3, vr2, vr1 + vst vr3, a0, 0 + addi.d a0, a0, 16 + bnez a4, 1b +2: + beqz t0, 4f +3: + ld.b t1, a1, 3 + addi.d t0, t0, -1 + addi.d a1, a1, 4 + andi t1, t1, 0xff + slli.w t2, t1, 6 + srli.w t3, t1, 2 + or t1, t2, t3 + st.h t1, a0, 0 + addi.d a0, a0, 2 + bnez t0, 3b +4: +endfunc + +function rgbaToA_lasx + andi t0, a4, 15 + srli.d a4, a4, 4 + xvxor.v xr0, xr0, xr0 + beqz a4, 2f +1: + xvld xr1, a1, 3 + xvld xr2, a1, 35 + addi.d a4, a4, -1 + addi.d a1, a1, 64 + xvpickev.b xr3, xr2, xr1 + xvpermi.d xr3, xr3, 0xd8 + xvpackev.b xr3, xr0, xr3 + xvslli.h xr1, xr3, 6 + xvsrli.h xr2, xr3, 2 + xvor.v xr3, xr2, xr1 + xvst xr3, a0, 0 + addi.d a0, a0, 32 + bnez a4, 1b +2: + beqz t0, 4f +3: + ld.b t1, a1, 3 + addi.d t0, t0, -1 + addi.d a1, a1, 4 + andi t1, t1, 0xff + slli.w t2, t1, 6 + srli.w t3, t1, 2 + or t1, t2, t3 + st.h t1, a0, 0 + addi.d a0, a0, 2 + bnez t0, 3b +4: +endfunc diff --git a/libswscale/loongarch/input_lasx.c b/libswscale/loongarch/input_lasx.c index 4830072eaf..0f1d954880 100644 --- a/libswscale/loongarch/input_lasx.c +++ b/libswscale/loongarch/input_lasx.c @@ -200,3 +200,46 @@ void planar_rgb_to_y_lasx(uint8_t *_dst, const uint8_t *src[4], int width, dst[i] = (tem_ry * r + tem_gy * g + tem_by * b + set) >> shift; } } + +av_cold void ff_sws_init_input_lasx(SwsContext *c) +{ + enum AVPixelFormat srcFormat = c->srcFormat; + + switch (srcFormat) { + case AV_PIX_FMT_YUYV422: + c->chrToYV12 = yuy2ToUV_lasx; + break; + case AV_PIX_FMT_YVYU422: + c->chrToYV12 = yvy2ToUV_lasx; + break; + case AV_PIX_FMT_UYVY422: + c->chrToYV12 = uyvyToUV_lasx; + break; + case AV_PIX_FMT_NV12: + case AV_PIX_FMT_NV16: + case AV_PIX_FMT_NV24: + c->chrToYV12 = nv12ToUV_lasx; + break; + case AV_PIX_FMT_NV21: + case AV_PIX_FMT_NV42: + c->chrToYV12 = nv21ToUV_lasx; + break; + case AV_PIX_FMT_GBRAP: + case AV_PIX_FMT_GBRP: + c->readChrPlanar = planar_rgb_to_uv_lasx; + break; + } + + if (c->needAlpha) { + switch (srcFormat) { + case AV_PIX_FMT_BGRA: + case AV_PIX_FMT_RGBA: + c->alpToYV12 = rgbaToA_lasx; + break; + case AV_PIX_FMT_ABGR: + case AV_PIX_FMT_ARGB: + c->alpToYV12 = abgrToA_lasx; + break; + } + } +} diff --git a/libswscale/loongarch/input_lsx.c b/libswscale/loongarch/input_lsx.c new file mode 100644 index 0000000000..1bb04457bb --- /dev/null +++ b/libswscale/loongarch/input_lsx.c @@ -0,0 +1,65 @@ +/* + * Copyright (C) 2024 Loongson Technology Corporation Limited + * Contributed by Shiyou Yin + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "swscale_loongarch.h" + +av_cold void ff_sws_init_input_lsx(SwsContext *c) +{ + enum AVPixelFormat srcFormat = c->srcFormat; + + switch (srcFormat) { + case AV_PIX_FMT_YUYV422: + c->chrToYV12 = yuy2ToUV_lsx; + break; + case AV_PIX_FMT_YVYU422: + c->chrToYV12 = yvy2ToUV_lsx; + break; + case AV_PIX_FMT_UYVY422: + c->chrToYV12 = uyvyToUV_lsx; + break; + case AV_PIX_FMT_NV12: + case AV_PIX_FMT_NV16: + case AV_PIX_FMT_NV24: + c->chrToYV12 = nv12ToUV_lsx; + break; + case AV_PIX_FMT_NV21: + case AV_PIX_FMT_NV42: + c->chrToYV12 = nv21ToUV_lsx; + break; + case AV_PIX_FMT_GBRAP: + case AV_PIX_FMT_GBRP: + c->readChrPlanar = planar_rgb_to_uv_lsx; + break; + } + + if (c->needAlpha) { + switch (srcFormat) { + case AV_PIX_FMT_BGRA: + case AV_PIX_FMT_RGBA: + c->alpToYV12 = rgbaToA_lsx; + break; + case AV_PIX_FMT_ABGR: + case AV_PIX_FMT_ARGB: + c->alpToYV12 = abgrToA_lsx; + break; + } + } +} diff --git a/libswscale/loongarch/swscale_init_loongarch.c b/libswscale/loongarch/swscale_init_loongarch.c index 04d2553fa4..3a5a7ee856 100644 --- a/libswscale/loongarch/swscale_init_loongarch.c +++ b/libswscale/loongarch/swscale_init_loongarch.c @@ -63,6 +63,7 @@ av_cold void ff_sws_init_swscale_loongarch(SwsContext *c) ff_sws_init_output_lsx(c, &c->yuv2plane1, &c->yuv2planeX, &c->yuv2nv12cX, &c->yuv2packed1, &c->yuv2packed2, &c->yuv2packedX, &c->yuv2anyX); + ff_sws_init_input_lsx(c); if (c->srcBpc == 8) { if (c->dstBpc <= 14) { c->hyScale = c->hcScale = ff_hscale_8_to_15_lsx; @@ -73,21 +74,13 @@ av_cold void ff_sws_init_swscale_loongarch(SwsContext *c) c->hyScale = c->hcScale = c->dstBpc > 14 ? ff_hscale_16_to_19_lsx : ff_hscale_16_to_15_lsx; } - switch (c->srcFormat) { - case AV_PIX_FMT_GBRAP: - case AV_PIX_FMT_GBRP: - { - c->readChrPlanar = planar_rgb_to_uv_lsx; - c->readLumPlanar = planar_rgb_to_y_lsx; - } - break; - } } #if HAVE_LASX if (have_lasx(cpu_flags)) { ff_sws_init_output_lasx(c, &c->yuv2plane1, &c->yuv2planeX, &c->yuv2nv12cX, &c->yuv2packed1, &c->yuv2packed2, &c->yuv2packedX, &c->yuv2anyX); + ff_sws_init_input_lasx(c); if (c->srcBpc == 8) { if (c->dstBpc <= 14) { c->hyScale = c->hcScale = ff_hscale_8_to_15_lasx; @@ -98,15 +91,6 @@ av_cold void ff_sws_init_swscale_loongarch(SwsContext *c) c->hyScale = c->hcScale = c->dstBpc > 14 ? ff_hscale_16_to_19_lasx : ff_hscale_16_to_15_lasx; } - switch (c->srcFormat) { - case AV_PIX_FMT_GBRAP: - case AV_PIX_FMT_GBRP: - { - c->readChrPlanar = planar_rgb_to_uv_lasx; - c->readLumPlanar = planar_rgb_to_y_lasx; - } - break; - } } #endif // #if HAVE_LASX ff_sws_init_range_convert_loongarch(c); diff --git a/libswscale/loongarch/swscale_loongarch.h b/libswscale/loongarch/swscale_loongarch.h index ea93881f8e..07c91bc25c 100644 --- a/libswscale/loongarch/swscale_loongarch.h +++ b/libswscale/loongarch/swscale_loongarch.h @@ -68,6 +68,29 @@ void yuv2planeX_8_lsx(const int16_t *filter, int filterSize, void yuv2plane1_8_lsx(const int16_t *src, uint8_t *dest, int dstW, const uint8_t *dither, int offset); +void yuy2ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void yvy2ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void uyvyToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void nv12ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void nv21ToUV_lsx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void abgrToA_lsx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + const uint8_t *unused2, int width, uint32_t *unused, void *opq); + +void rgbaToA_lsx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + const uint8_t *unused2, int width, uint32_t *unused, void *opq); + +av_cold void ff_sws_init_input_lsx(SwsContext *c); + av_cold void ff_sws_init_output_lsx(SwsContext *c, yuv2planar1_fn *yuv2plane1, yuv2planarX_fn *yuv2planeX, @@ -152,6 +175,29 @@ void yuv2planeX_8_lasx(const int16_t *filter, int filterSize, void yuv2plane1_8_lasx(const int16_t *src, uint8_t *dest, int dstW, const uint8_t *dither, int offset); +void yuy2ToUV_lasx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void yvy2ToUV_lasx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void uyvyToUV_lasx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void nv12ToUV_lasx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void nv21ToUV_lasx(uint8_t *dstU, uint8_t *dstV, const uint8_t *unused0, const uint8_t *src1, + const uint8_t *src2, int width, uint32_t *unused, void *opq); + +void abgrToA_lasx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + const uint8_t *unused2, int width, uint32_t *unused, void *opq); + +void rgbaToA_lasx(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1, + const uint8_t *unused2, int width, uint32_t *unused, void *opq); + +av_cold void ff_sws_init_input_lasx(SwsContext *c); + av_cold void ff_sws_init_output_lasx(SwsContext *c, yuv2planar1_fn *yuv2plane1, yuv2planarX_fn *yuv2planeX,