From patchwork Sat Mar 16 03:03:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiyou Yin X-Patchwork-Id: 47115 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a21:3942:b0:1a3:31a3:7958 with SMTP id ac2csp231287pzc; Fri, 15 Mar 2024 20:03:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCX1aGjznhTbf3IljkWeaIze/BCR+I67wMBtoNaxCqR9bmVkBknx9S3cxLMsBs5/nSB7NEN6Pk/x/XrDWtr9Ng1wRn/SeRPzI6CmjQ== X-Google-Smtp-Source: AGHT+IF/bi/2NuCOGXiqKPj1gp2D+RuvEHhw+BZMvbdgbbsluinl7HYsWgDMW3ROYGDSjk4SG/6x X-Received: by 2002:a05:6512:20c9:b0:513:c2e4:9daf with SMTP id u9-20020a05651220c900b00513c2e49dafmr3911391lfr.52.1710558236038; Fri, 15 Mar 2024 20:03:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1710558236; cv=none; d=google.com; s=arc-20160816; b=Z9jVMPtXLQL/8GyQN6roe77Ce/b95UKj0tBjowi18/uuphIPkDZH/V/0BHrHK515Ws T5YBVk/2MXMIVyrQIjRHyenCteFc/LzuWYQvDF7espvuXQDCALBTGHZqeHqMWRGM7ILw lnKH8c+KtEeze+yGZr/Zc4VCLVe00awOEAVGO3VD2ZPlI+37CET+MS1EBpntVe6MyRvj ML/8ZITugsxkHs8bnAISVeS5W4NSWDda67qYPfJtEwPBp7mr+3hbF4T6R2qVKBuiXv0X xGx/LHaVPeoEuaGMOeCb6L2JqRgxBR2j0I2LiISL0WJ2zd6okTZG7CdFuwaccw4daE/+ SItA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:reply-to:list-subscribe :list-help:list-post:list-archive:list-unsubscribe:list-id :precedence:subject:mime-version:references:in-reply-to:message-id :date:to:from:delivered-to; bh=DVgvBb48sQV1F6S9bmyV+mTuZexW3Xd2XgNY7lskjyY=; fh=YOA8vD9MJZuwZ71F/05pj6KdCjf6jQRmzLS+CATXUQk=; b=Be6pJQHGHbYrbiv1BoNK0VqU3FHGKH/HXeqaYT3GEEwS8dtYkxJx+Ehs5/KjgKMi0V scdKff45Lv8uIWSizBxJq1Sh28cb/1VjREaqwDRkvUuYywq/JZzmDGYBZ29ybzGL2Tg1 avK3aU9pzH+AAOnyX2Yzt6glj8bUCg0RTrV0ktcA9DEy2M+BLt7PUAiAruRjE1SH2e2t 4FCpGs/jQdUt5aomlQRLa3gY04fLq8L8Uh6AWEOXh0PwWHtP/qnyMwCHKmBSQLXGpUFY i/Jd2Bh9Snhpy36QVHnuFUMyiXfdExmkS47laN9u3T6TjLqbc0/ALj9iGedbAfu/o61G wuQg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id dt22-20020a170906b79600b00a45a018c310si2192533ejb.115.2024.03.15.20.03.55; Fri, 15 Mar 2024 20:03:56 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id CA17E68D01E; Sat, 16 Mar 2024 05:03:50 +0200 (EET) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9F08B68D01E for ; Sat, 16 Mar 2024 05:03:42 +0200 (EET) Received: from loongson.cn (unknown [36.33.26.33]) by gateway (Coremail) with SMTP id _____8DxmfALDPVlP78ZAA--.62603S3; Sat, 16 Mar 2024 11:03:39 +0800 (CST) Received: from localhost (unknown [36.33.26.33]) by localhost.localdomain (Coremail) with SMTP id AQAAf8DxvhMIDPVlnG1bAA--.54430S3; Sat, 16 Mar 2024 11:03:37 +0800 (CST) From: Shiyou Yin To: ffmpeg-devel@ffmpeg.org Date: Sat, 16 Mar 2024 11:03:31 +0800 Message-Id: <20240316030333.31269-2-yinshiyou-hf@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240316030333.31269-1-yinshiyou-hf@loongson.cn> References: <20240316030333.31269-1-yinshiyou-hf@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8DxvhMIDPVlnG1bAA--.54430S3 X-CM-SenderInfo: p1lq2x5l1r3gtki6z05rqj20fqof0/ X-Coremail-Antispam: 1Uk129KBj93XoW3Aw1fAw1DCFW5JF4DCr15GFX_yoWkCFy7pF y7Kwn8Kw1agr95u3Zrt3yjv340gryxCr1a93WxtrW0krZ7WrsIgF15tF9F9FyIg3y3A3WF va1UJay7Ka1jvFbCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkjb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AK xVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1j6r15MIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I 0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jb_-PUUUUU= Subject: [FFmpeg-devel] [PATCH 1/3] swscale: [LA] Optimize range convert for yuvj420p. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: VclvY+wQ5xIJ --- libswscale/loongarch/swscale.S | 368 ++++++++++++++++++ libswscale/loongarch/swscale_init_loongarch.c | 33 ++ libswscale/loongarch/swscale_loongarch.h | 11 + libswscale/swscale_internal.h | 1 + libswscale/utils.c | 6 +- 5 files changed, 418 insertions(+), 1 deletion(-) diff --git a/libswscale/loongarch/swscale.S b/libswscale/loongarch/swscale.S index aa4c5cbe28..67b1bc834d 100644 --- a/libswscale/loongarch/swscale.S +++ b/libswscale/loongarch/swscale.S @@ -1866,3 +1866,371 @@ function ff_hscale_16_to_19_sub_lsx ld.d s8, sp, 64 addi.d sp, sp, 72 endfunc + +function lumRangeFromJpeg_lsx + li.w t0, 14071 + li.w t1, 33561947 + vreplgr2vr.h vr0, t0 + srli.w t2, a1, 3 + andi t3, a1, 7 + beqz t2, 2f +1: + vld vr1, a0, 0 + vreplgr2vr.w vr2, t1 + vreplgr2vr.w vr3, t1 + vmaddwev.w.h vr2, vr0, vr1 + vmaddwod.w.h vr3, vr0, vr1 + vsrai.w vr2, vr2, 14 + vsrai.w vr3, vr3, 14 + vpackev.h vr1, vr3, vr2 + vst vr1, a0, 0 + addi.d a0, a0, 16 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + mul.w t4, t4, t0 + add.w t4, t4, t1 + srai.w t4, t4, 14 + st.h t4, a0, 0 + addi.d a0, a0, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function lumRangeFromJpeg_lasx + li.w t0, 14071 + li.w t1, 33561947 + xvreplgr2vr.h xr0, t0 + srli.w t2, a1, 4 + andi t3, a1, 15 + beqz t2, 2f +1: + xvld xr1, a0, 0 + xvreplgr2vr.w xr2, t1 + xvreplgr2vr.w xr3, t1 + xvmaddwev.w.h xr2, xr0, xr1 + xvmaddwod.w.h xr3, xr0, xr1 + xvsrai.w xr2, xr2, 14 + xvsrai.w xr3, xr3, 14 + xvpackev.h xr1, xr3, xr2 + xvst xr1, a0, 0 + addi.d a0, a0, 32 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + mul.w t4, t4, t0 + add.w t4, t4, t1 + srai.w t4, t4, 14 + st.h t4, a0, 0 + addi.d a0, a0, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function lumRangeToJpeg_lsx + li.w t0, 19077 + li.w t1, -39057361 + li.w t2, 30189 + vreplgr2vr.h vr0, t0 + vreplgr2vr.h vr4, t2 + srli.w t2, a1, 3 + andi t3, a1, 7 + beqz t2, 2f +1: + vld vr1, a0, 0 + vreplgr2vr.w vr2, t1 + vreplgr2vr.w vr3, t1 + vmin.h vr1, vr1, vr4 + vmaddwev.w.h vr2, vr0, vr1 + vmaddwod.w.h vr3, vr0, vr1 + vsrai.w vr2, vr2, 14 + vsrai.w vr3, vr3, 14 + vpackev.h vr1, vr3, vr2 + vst vr1, a0, 0 + addi.d a0, a0, 16 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + vreplgr2vr.h vr1, t4 + vmin.h vr1, vr1, vr4 + vpickve2gr.h t4, vr1, 0 + mul.w t4, t4, t0 + add.w t4, t4, t1 + srai.w t4, t4, 14 + st.h t4, a0, 0 + addi.d a0, a0, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function lumRangeToJpeg_lasx + li.w t0, 19077 + li.w t1, -39057361 + li.w t2, 30189 + xvreplgr2vr.h xr0, t0 + xvreplgr2vr.h xr4, t2 + srli.w t2, a1, 4 + andi t3, a1, 15 + beqz t2, 2f +1: + xvld xr1, a0, 0 + xvreplgr2vr.w xr2, t1 + xvreplgr2vr.w xr3, t1 + xvmin.h xr1, xr1, xr4 + xvmaddwev.w.h xr2, xr0, xr1 + xvmaddwod.w.h xr3, xr0, xr1 + xvsrai.w xr2, xr2, 14 + xvsrai.w xr3, xr3, 14 + xvpackev.h xr1, xr3, xr2 + xvst xr1, a0, 0 + addi.d a0, a0, 32 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + vreplgr2vr.h vr1, t4 + vmin.h vr1, vr1, vr4 + vpickve2gr.h t4, vr1, 0 + mul.w t4, t4, t0 + add.w t4, t4, t1 + srai.w t4, t4, 14 + st.h t4, a0, 0 + addi.d a0, a0, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function chrRangeFromJpeg_lsx + li.w t0, 1799 + li.w t1, 4081085 + vreplgr2vr.h vr0, t0 + srli.w t2, a2, 3 + andi t3, a2, 7 + beqz t2, 2f +1: + vld vr1, a0, 0 + vld vr2, a1, 0 + vreplgr2vr.w vr3, t1 + vreplgr2vr.w vr4, t1 + vreplgr2vr.w vr5, t1 + vreplgr2vr.w vr6, t1 + vmaddwev.w.h vr3, vr0, vr1 + vmaddwod.w.h vr4, vr0, vr1 + vmaddwev.w.h vr5, vr0, vr2 + vmaddwod.w.h vr6, vr0, vr2 + vsrai.w vr3, vr3, 11 + vsrai.w vr4, vr4, 11 + vsrai.w vr5, vr5, 11 + vsrai.w vr6, vr6, 11 + vpackev.h vr1, vr4, vr3 + vpackev.h vr2, vr6, vr5 + vst vr1, a0, 0 + vst vr2, a1, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + ld.h t5, a1, 0 + mul.w t4, t4, t0 + mul.w t5, t5, t0 + add.w t4, t4, t1 + add.w t5, t5, t1 + srai.w t4, t4, 11 + srai.w t5, t5, 11 + st.h t4, a0, 0 + st.h t5, a1, 0 + addi.d a0, a0, 2 + addi.d a1, a1, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function chrRangeFromJpeg_lasx + li.w t0, 1799 + li.w t1, 4081085 + xvreplgr2vr.h xr0, t0 + srli.w t2, a2, 4 + andi t3, a2, 15 + beqz t2, 2f +1: + xvld xr1, a0, 0 + xvld xr2, a1, 0 + xvreplgr2vr.w xr3, t1 + xvreplgr2vr.w xr4, t1 + xvreplgr2vr.w xr5, t1 + xvreplgr2vr.w xr6, t1 + xvmaddwev.w.h xr3, xr0, xr1 + xvmaddwod.w.h xr4, xr0, xr1 + xvmaddwev.w.h xr5, xr0, xr2 + xvmaddwod.w.h xr6, xr0, xr2 + xvsrai.w xr3, xr3, 11 + xvsrai.w xr4, xr4, 11 + xvsrai.w xr5, xr5, 11 + xvsrai.w xr6, xr6, 11 + xvpackev.h xr1, xr4, xr3 + xvpackev.h xr2, xr6, xr5 + xvst xr1, a0, 0 + xvst xr2, a1, 0 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + ld.h t5, a1, 0 + mul.w t4, t4, t0 + mul.w t5, t5, t0 + add.w t4, t4, t1 + add.w t5, t5, t1 + srai.w t4, t4, 11 + srai.w t5, t5, 11 + st.h t4, a0, 0 + st.h t5, a1, 0 + addi.d a0, a0, 2 + addi.d a1, a1, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function chrRangeToJpeg_lsx + li.w t0, 4663 + li.w t1, -9289992 + li.w t2, 30775 + vreplgr2vr.h vr0, t0 + vreplgr2vr.h vr7, t2 + srli.w t2, a2, 3 + andi t3, a2, 7 + beqz t2, 2f +1: + vld vr1, a0, 0 + vld vr2, a1, 0 + vreplgr2vr.w vr3, t1 + vreplgr2vr.w vr4, t1 + vreplgr2vr.w vr5, t1 + vreplgr2vr.w vr6, t1 + vmin.h vr1, vr1, vr7 + vmin.h vr2, vr2, vr7 + vmaddwev.w.h vr3, vr0, vr1 + vmaddwod.w.h vr4, vr0, vr1 + vmaddwev.w.h vr5, vr0, vr2 + vmaddwod.w.h vr6, vr0, vr2 + vsrai.w vr3, vr3, 12 + vsrai.w vr4, vr4, 12 + vsrai.w vr5, vr5, 12 + vsrai.w vr6, vr6, 12 + vpackev.h vr1, vr4, vr3 + vpackev.h vr2, vr6, vr5 + vst vr1, a0, 0 + vst vr2, a1, 0 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + ld.h t5, a1, 0 + vreplgr2vr.h vr1, t4 + vreplgr2vr.h vr2, t5 + vmin.h vr1, vr1, vr7 + vmin.h vr2, vr2, vr7 + vpickve2gr.h t4, vr1, 0 + vpickve2gr.h t5, vr2, 0 + mul.w t4, t4, t0 + mul.w t5, t5, t0 + add.w t4, t4, t1 + add.w t5, t5, t1 + srai.w t4, t4, 12 + srai.w t5, t5, 12 + st.h t4, a0, 0 + st.h t5, a1, 0 + addi.d a0, a0, 2 + addi.d a1, a1, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc + +function chrRangeToJpeg_lasx + li.w t0, 4663 + li.w t1, -9289992 + li.w t2, 30775 + xvreplgr2vr.h xr0, t0 + xvreplgr2vr.h xr7, t2 + srli.w t2, a2, 4 + andi t3, a2, 15 + beqz t2, 2f +1: + xvld xr1, a0, 0 + xvld xr2, a1, 0 + xvreplgr2vr.w xr3, t1 + xvreplgr2vr.w xr4, t1 + xvreplgr2vr.w xr5, t1 + xvreplgr2vr.w xr6, t1 + xvmin.h xr1, xr1, xr7 + xvmin.h xr2, xr2, xr7 + xvmaddwev.w.h xr3, xr0, xr1 + xvmaddwod.w.h xr4, xr0, xr1 + xvmaddwev.w.h xr5, xr0, xr2 + xvmaddwod.w.h xr6, xr0, xr2 + xvsrai.w xr3, xr3, 12 + xvsrai.w xr4, xr4, 12 + xvsrai.w xr5, xr5, 12 + xvsrai.w xr6, xr6, 12 + xvpackev.h xr1, xr4, xr3 + xvpackev.h xr2, xr6, xr5 + xvst xr1, a0, 0 + xvst xr2, a1, 0 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + addi.d t2, t2, -1 + bnez t2, 1b +2: + beqz t3, 4f +3: + ld.h t4, a0, 0 + ld.h t5, a1, 0 + vreplgr2vr.h vr1, t4 + vreplgr2vr.h vr2, t5 + vmin.h vr1, vr1, vr7 + vmin.h vr2, vr2, vr7 + vpickve2gr.h t4, vr1, 0 + vpickve2gr.h t5, vr2, 0 + mul.w t4, t4, t0 + mul.w t5, t5, t0 + add.w t4, t4, t1 + add.w t5, t5, t1 + srai.w t4, t4, 12 + srai.w t5, t5, 12 + st.h t4, a0, 0 + st.h t5, a1, 0 + addi.d a0, a0, 2 + addi.d a1, a1, 2 + addi.d t3, t3, -1 + bnez t3, 3b +4: +endfunc diff --git a/libswscale/loongarch/swscale_init_loongarch.c b/libswscale/loongarch/swscale_init_loongarch.c index 53e4f970b6..6d2786c55f 100644 --- a/libswscale/loongarch/swscale_init_loongarch.c +++ b/libswscale/loongarch/swscale_init_loongarch.c @@ -24,6 +24,38 @@ #include "libswscale/rgb2rgb.h" #include "libavutil/loongarch/cpu.h" +av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c) +{ + int cpu_flags = av_get_cpu_flags(); + + if (have_lsx(cpu_flags)) { + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { + if (c->dstBpc <= 14) { + if (c->srcRange) { + c->lumConvertRange = lumRangeFromJpeg_lsx; + c->chrConvertRange = chrRangeFromJpeg_lsx; + } else { + c->lumConvertRange = lumRangeToJpeg_lsx; + c->chrConvertRange = chrRangeToJpeg_lsx; + } + } + } + } + if (have_lasx(cpu_flags)) { + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { + if (c->dstBpc <= 14) { + if (c->srcRange) { + c->lumConvertRange = lumRangeFromJpeg_lasx; + c->chrConvertRange = chrRangeFromJpeg_lasx; + } else { + c->lumConvertRange = lumRangeToJpeg_lasx; + c->chrConvertRange = chrRangeToJpeg_lasx; + } + } + } + } +} + av_cold void ff_sws_init_swscale_loongarch(SwsContext *c) { int cpu_flags = av_get_cpu_flags(); @@ -77,6 +109,7 @@ av_cold void ff_sws_init_swscale_loongarch(SwsContext *c) c->yuv2planeX = ff_yuv2planeX_8_lasx; } #endif // #if HAVE_LASX + ff_sws_init_range_convert_loongarch(c); } av_cold void rgb2rgb_init_loongarch(void) diff --git a/libswscale/loongarch/swscale_loongarch.h b/libswscale/loongarch/swscale_loongarch.h index 0514abae21..c96b085982 100644 --- a/libswscale/loongarch/swscale_loongarch.h +++ b/libswscale/loongarch/swscale_loongarch.h @@ -50,6 +50,11 @@ void ff_hscale_16_to_19_sub_lsx(SwsContext *c, int16_t *_dst, int dstW, const uint8_t *_src, const int16_t *filter, const int32_t *filterPos, int filterSize, int sh); +void lumRangeFromJpeg_lsx(int16_t *dst, int width); +void chrRangeFromJpeg_lsx(int16_t *dstU, int16_t *dstV, int width); +void lumRangeToJpeg_lsx(int16_t *dst, int width); +void chrRangeToJpeg_lsx(int16_t *dstU, int16_t *dstV, int width); + void planar_rgb_to_uv_lsx(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *src[4], int width, int32_t *rgb2yuv, void *opq); @@ -97,6 +102,11 @@ void ff_hscale_16_to_15_lasx(SwsContext *c, int16_t *dst, int dstW, const uint8_t *_src, const int16_t *filter, const int32_t *filterPos, int filterSize); +void lumRangeFromJpeg_lasx(int16_t *dst, int width); +void chrRangeFromJpeg_lasx(int16_t *dstU, int16_t *dstV, int width); +void lumRangeToJpeg_lasx(int16_t *dst, int width); +void chrRangeToJpeg_lasx(int16_t *dstU, int16_t *dstV, int width); + void planar_rgb_to_uv_lasx(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *src[4], int width, int32_t *rgb2yuv, void *opq); @@ -130,6 +140,7 @@ void ff_yuv2planeX_8_lasx(const int16_t *filter, int filterSize, const uint8_t *dither, int offset); av_cold void ff_sws_init_output_lasx(SwsContext *c); + #endif // #if HAVE_LASX #endif /* SWSCALE_LOONGARCH_SWSCALE_LOONGARCH_H */ diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index abeebbb002..0db581acf8 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -695,6 +695,7 @@ void ff_yuv2rgb_init_tables_ppc(SwsContext *c, const int inv_table[4], void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); +av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index ab8a68e241..47db65ef0e 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1049,8 +1049,12 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], c->srcRange = srcRange; c->dstRange = dstRange; - if (need_reinit) + if (need_reinit) { ff_sws_init_range_convert(c); +#if ARCH_LOONGARCH64 + ff_sws_init_range_convert_loongarch(c); +#endif + } c->dstFormatBpp = av_get_bits_per_pixel(desc_dst); c->srcFormatBpp = av_get_bits_per_pixel(desc_src);