From patchwork Fri Jun 9 07:17:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnie Chang X-Patchwork-Id: 42022 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:c526:b0:117:ac03:c9de with SMTP id gm38csp935242pzb; Fri, 9 Jun 2023 00:17:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6QA3D9EYXr5GTwNxaNcqEpHSEt7OnwbDNkOhsLdNvbLuRJDQ5z2pni1EaLrh/wZgVRbug9 X-Received: by 2002:aa7:c682:0:b0:514:9e91:f54 with SMTP id n2-20020aa7c682000000b005149e910f54mr601572edq.26.1686295071599; Fri, 09 Jun 2023 00:17:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686295071; cv=none; d=google.com; s=arc-20160816; b=LVKv8MT6q1HAE1CSVaM/Hyq981SfVFWKmzjlrs4bESUVuPeGCpm7z/x2rn3MlZdnHf wAdSXWR49KLGTf48ymioAwlOa7J1xh970oFCKvnl50I6mJXKHzLTjKRGfiTzEoykYJgJ am1yhgC8/3lWjYUsqF2HtL2tswmeDdumQ3BZra6wWvyPegdU9z1bbYvQH/myQKs/5smW EE2aq7COSXwJ0EetLYalyZFAKyJCqjDxQvukFuMnwPw7SeZRkluxNo0lFJXTwxF7Z/Kd sGfwhpyE+kvJX8FcucS1Kr1C83gxzsmWxXoOxLG1sUN/+QQAcd2JkPo3RDt2K9dAw0kJ ZIEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=Pyryuy0SvEj7WAfP+YLqoMtv/Gz6O4g8tQu+ur0P3YA=; b=XZwnpohzzVPAu2XE8CRyTHtpcpbHtIhqGPMfiOal5mDG1VEHbZyOkPgGAqHwqaroeJ cz+h1iBlRRaL9lFiOLjH7iP04aYE0pGXB1tZ/o+EZn3qZUPkuq3q5D1BXw5v0a4s5hdP zbZHejR2AkgOTSkNA4spvguV6GEcO1pIZExTRhJfYi5OsW3DoABp2rNzZYnraX8wIQQO 4MpgiBibwgIyy7wNtTTYNYb2M6EJYX0DL96nm/KmbokSPcR7Ks+njr8kHgoShonMJj7G VLx9Z17JTPYkz42vkpHErMa4cVAZQzHQAJvCIVr+qT80SiSAyIibGWLUHAaEp/pbYywn 1NMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@sifive.com header.s=google header.b=buajAGRY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m3-20020aa7c2c3000000b00515948b085fsi1855512edp.120.2023.06.09.00.17.50; Fri, 09 Jun 2023 00:17:51 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@sifive.com header.s=google header.b=buajAGRY; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9A1CE68C277; Fri, 9 Jun 2023 10:17:46 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2080068C075 for ; Fri, 9 Jun 2023 10:17:40 +0300 (EEST) Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-39aa8256db7so349714b6e.0 for ; Fri, 09 Jun 2023 00:17:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1686295058; x=1688887058; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Us1EPEeGi9LN4Ycj9aTXgzx8Lps/aRdIh6sgF8+HP34=; b=buajAGRYsLI7sQeI8myOiRnar3Dudk7Qt9EbocObZutusexUB0z5y30iMQaZyQ9490 iLiF/SC6+hzqSvia/tfiHnILnkGrUN3+o3HtK356/Z3MqqAPAKXuPwsnNFmnkgpSVauO o28m2VR8sC1gzjyOFJsffYQz8lfHpJF0kwuaZwKkwnGbjnWre7qfI7Bc6O1iqPq0tAsH AVIwfoSKTp6LniVci+3ufUhVNN2UqIYGIqErrtrg9NV9nYTPUxYQWn620HEzDRmghPpN RMxjo99daou0rMbwyS5AsYlN/8UkyXrkjNtOtffoGB38croxQyHiz6nbfK9FEf5ZAwG8 qaqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686295058; x=1688887058; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Us1EPEeGi9LN4Ycj9aTXgzx8Lps/aRdIh6sgF8+HP34=; b=XEFtPh+9OiE1CfM8o2NQUiGObIKK9s9e4L6OuwJ2J96ewnNIOkzE2BXZX85LBpAxgz YvPHrYc9qlIv6cszM6KIojhA4Nk5191pM/XtyntqiteIyMyq0TNzHgvrSfLSKaXs7Y1m LE8ICMgjH94NjLZRkaDKO8by93m9Yl3EMMtU06g1GQQ68nyzjMkLyqdaHNRE81DhFdC6 VvOl+kFtTrrzEb0ocHpY6n7uUcHZUFDO1ZTgnLDZskwr3ciQPONMmkbfZ/J9/Ax+tREP eZpEg7pT3ZwqTbh3/MB61kcoOjSpbjrobHpbq4MMf6cyQ9GGNw5vxsJ1wTwZhWqYOV3q RN4A== X-Gm-Message-State: AC+VfDzaXW0peo4PQjVQy2YnEFczXAfrlhzsXerZVP6Hiyvsr9cZHfSs 6kvs1/Q0VaztfC6V/6nkwn7i6OUnTjwcqqWUuN0bRc6NlTqqjR9ANubwMBEJwfAWHvodL4JFmjA KGAHorhjDdbN4UB0WAsUXQ+NQsyCaSCQgO3vO0FvIryCB0S3jByD6+eUZ8skE7Pm/6G+8K4d/9n BhlhEx X-Received: by 2002:a05:6808:646:b0:39c:76d0:ef8b with SMTP id z6-20020a056808064600b0039c76d0ef8bmr771273oih.34.1686295057316; Fri, 09 Jun 2023 00:17:37 -0700 (PDT) Received: from arnie-ThinkPad-T480s.localdomain (111-250-79-205.dynamic-ip.hinet.net. [111.250.79.205]) by smtp.gmail.com with ESMTPSA id gt1-20020a17090af2c100b002591b957641sm2271088pjb.41.2023.06.09.00.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Jun 2023 00:17:36 -0700 (PDT) From: Arnie Chang To: ffmpeg-devel@ffmpeg.org Date: Fri, 9 Jun 2023 15:17:27 +0800 Message-Id: <20230609071727.524-1-arnie.chang@sifive.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Arnie Chang Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 7sN87dLrW00j Optimize the put and avg filtering for 4xH and 2xH blocks Signed-off-by: Arnie Chang --- checkasm: using random seed 3475799765 RVVi32: - h264chroma.chroma_mc [OK] checkasm: all 6 tests passed avg_h264_chroma_mc1_8_c: 1821.5 avg_h264_chroma_mc1_8_rvv_i32: 466.5 avg_h264_chroma_mc2_8_c: 939.2 avg_h264_chroma_mc2_8_rvv_i32: 466.5 avg_h264_chroma_mc4_8_c: 502.2 avg_h264_chroma_mc4_8_rvv_i32: 466.5 put_h264_chroma_mc1_8_c: 1436.5 put_h264_chroma_mc1_8_rvv_i32: 382.5 put_h264_chroma_mc2_8_c: 824.2 put_h264_chroma_mc2_8_rvv_i32: 382.5 put_h264_chroma_mc4_8_c: 431.2 put_h264_chroma_mc4_8_rvv_i32: 382.5 libavcodec/riscv/h264_chroma_init_riscv.c | 8 + libavcodec/riscv/h264_mc_chroma.S | 216 ++++++++++++++-------- 2 files changed, 144 insertions(+), 80 deletions(-) diff --git a/libavcodec/riscv/h264_chroma_init_riscv.c b/libavcodec/riscv/h264_chroma_init_riscv.c index 7c905edfcd..9f95150ea3 100644 --- a/libavcodec/riscv/h264_chroma_init_riscv.c +++ b/libavcodec/riscv/h264_chroma_init_riscv.c @@ -27,6 +27,10 @@ void h264_put_chroma_mc8_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); void h264_avg_chroma_mc8_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); +void h264_put_chroma_mc4_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); +void h264_avg_chroma_mc4_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); +void h264_put_chroma_mc2_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); +void h264_avg_chroma_mc2_rvv(uint8_t *p_dst, const uint8_t *p_src, ptrdiff_t stride, int h, int x, int y); av_cold void ff_h264chroma_init_riscv(H264ChromaContext *c, int bit_depth) { @@ -36,6 +40,10 @@ av_cold void ff_h264chroma_init_riscv(H264ChromaContext *c, int bit_depth) if (bit_depth == 8 && (flags & AV_CPU_FLAG_RVV_I32) && ff_get_rv_vlenb() >= 16) { c->put_h264_chroma_pixels_tab[0] = h264_put_chroma_mc8_rvv; c->avg_h264_chroma_pixels_tab[0] = h264_avg_chroma_mc8_rvv; + c->put_h264_chroma_pixels_tab[1] = h264_put_chroma_mc4_rvv; + c->avg_h264_chroma_pixels_tab[1] = h264_avg_chroma_mc4_rvv; + c->put_h264_chroma_pixels_tab[2] = h264_put_chroma_mc2_rvv; + c->avg_h264_chroma_pixels_tab[2] = h264_avg_chroma_mc2_rvv; } #endif } diff --git a/libavcodec/riscv/h264_mc_chroma.S b/libavcodec/riscv/h264_mc_chroma.S index 364bc3156e..c97cdbad86 100644 --- a/libavcodec/riscv/h264_mc_chroma.S +++ b/libavcodec/riscv/h264_mc_chroma.S @@ -19,8 +19,7 @@ */ #include "libavutil/riscv/asm.S" -.macro h264_chroma_mc8 type -func h264_\type\()_chroma_mc8_rvv, zve32x +.macro do_chroma_mc type width unroll csrw vxrm, zero slli t2, a5, 3 mul t1, a5, a4 @@ -30,94 +29,104 @@ func h264_\type\()_chroma_mc8_rvv, zve32x sub a7, a4, t1 addi a6, a5, 64 sub t0, t2, t1 - vsetivli t3, 8, e8, m1, ta, mu + vsetivli t3, \width, e8, m1, ta, mu beqz t1, 2f blez a3, 8f li t4, 0 li t2, 0 li t5, 1 addi a5, t3, 1 + .ifc \unroll,1 slli t3, a2, 2 + .else + slli t3, a2, 1 + .endif 1: # if (xy != 0) add a4, a1, t4 vsetvli zero, a5, e8, m1, ta, ma + .ifc \unroll,1 addi t2, t2, 4 + .else + addi t2, t2, 2 + .endif vle8.v v10, (a4) add a4, a4, a2 vslide1down.vx v11, v10, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v8, v10, a6 vwmaccu.vx v8, a7, v11 vsetvli zero, a5, e8, m1, ta, ma vle8.v v12, (a4) - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma add a4, a4, a2 vwmaccu.vx v8, t0, v12 vsetvli zero, a5, e8, m1, ta, ma vslide1down.vx v13, v12, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v10, v12, a6 vwmaccu.vx v8, t1, v13 vwmaccu.vx v10, a7, v13 vsetvli zero, a5, e8, m1, ta, ma vle8.v v14, (a4) - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma add a4, a4, a2 vwmaccu.vx v10, t0, v14 vsetvli zero, a5, e8, m1, ta, ma vslide1down.vx v15, v14, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v12, v14, a6 vwmaccu.vx v10, t1, v15 vwmaccu.vx v12, a7, v15 + vnclipu.wi v15, v8, 6 + .ifc \type,avg + vle8.v v9, (a0) + vaaddu.vv v15, v15, v9 + .endif + vse8.v v15, (a0) + add a0, a0, a2 + vnclipu.wi v8, v10, 6 + .ifc \type,avg + vle8.v v9, (a0) + vaaddu.vv v8, v8, v9 + .endif + add t4, t4, t3 + vse8.v v8, (a0) + add a0, a0, a2 + .ifc \unroll,1 vsetvli zero, a5, e8, m1, ta, ma vle8.v v14, (a4) - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma add a4, a4, a2 vwmaccu.vx v12, t0, v14 vsetvli zero, a5, e8, m1, ta, ma vslide1down.vx v15, v14, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v16, v14, a6 vwmaccu.vx v12, t1, v15 vwmaccu.vx v16, a7, v15 vsetvli zero, a5, e8, m1, ta, ma vle8.v v14, (a4) - vsetivli zero, 8, e8, m1, ta, ma - add a4, a0, t4 - add t4, t4, t3 + vsetivli zero, \width, e8, m1, ta, ma vwmaccu.vx v16, t0, v14 vsetvli zero, a5, e8, m1, ta, ma vslide1down.vx v14, v14, t5 - vsetivli zero, 8, e8, m1, ta, ma - vnclipu.wi v15, v8, 6 + vsetivli zero, \width, e8, m1, ta, ma vwmaccu.vx v16, t1, v14 - .ifc \type,avg - vle8.v v9, (a4) - vaaddu.vv v15, v15, v9 - .endif - vse8.v v15, (a4) - add a4, a4, a2 - vnclipu.wi v8, v10, 6 - .ifc \type,avg - vle8.v v9, (a4) - vaaddu.vv v8, v8, v9 - .endif - vse8.v v8, (a4) - add a4, a4, a2 vnclipu.wi v8, v12, 6 .ifc \type,avg - vle8.v v9, (a4) + vle8.v v9, (a0) vaaddu.vv v8, v8, v9 .endif - vse8.v v8, (a4) - add a4, a4, a2 + vse8.v v8, (a0) + add a0, a0, a2 vnclipu.wi v8, v16, 6 .ifc \type,avg - vle8.v v9, (a4) + vle8.v v9, (a0) vaaddu.vv v8, v8, v9 .endif - vse8.v v8, (a4) + vse8.v v8, (a0) + add a0, a0, a2 + .endif blt t2, a3, 1b j 8f 2: @@ -126,11 +135,19 @@ func h264_\type\()_chroma_mc8_rvv, zve32x blez a3, 8f li a4, 0 li t1, 0 + .ifc \unroll,1 slli a7, a2, 2 + .else + slli a7, a2, 1 + .endif 3: # if ((x8 - xy) == 0 && (y8 -xy) != 0) add a5, a1, a4 vsetvli zero, zero, e8, m1, ta, ma + .ifc \unroll,1 addi t1, t1, 4 + .else + addi t1, t1, 2 + .endif vle8.v v8, (a5) add a5, a5, a2 add t2, a5, a2 @@ -141,42 +158,44 @@ func h264_\type\()_chroma_mc8_rvv, zve32x add t2, t2, a2 add a5, t2, a2 vwmaccu.vx v10, t0, v8 - vle8.v v8, (t2) - vle8.v v14, (a5) - add a5, a0, a4 add a4, a4, a7 vwmaccu.vx v12, t0, v9 vnclipu.wi v15, v10, 6 vwmulu.vx v10, v9, a6 + vnclipu.wi v9, v12, 6 .ifc \type,avg - vle8.v v16, (a5) + vle8.v v16, (a0) vaaddu.vv v15, v15, v16 .endif - vse8.v v15, (a5) - add a5, a5, a2 - vnclipu.wi v9, v12, 6 - vwmaccu.vx v10, t0, v8 - vwmulu.vx v12, v8, a6 + vse8.v v15, (a0) + add a0, a0, a2 .ifc \type,avg - vle8.v v16, (a5) + vle8.v v16, (a0) vaaddu.vv v9, v9, v16 .endif - vse8.v v9, (a5) - add a5, a5, a2 + vse8.v v9, (a0) + add a0, a0, a2 + .ifc \unroll,1 + vle8.v v8, (t2) + vle8.v v14, (a5) + vwmaccu.vx v10, t0, v8 + vwmulu.vx v12, v8, a6 vnclipu.wi v8, v10, 6 vwmaccu.vx v12, t0, v14 .ifc \type,avg - vle8.v v16, (a5) + vle8.v v16, (a0) vaaddu.vv v8, v8, v16 .endif - vse8.v v8, (a5) - add a5, a5, a2 + vse8.v v8, (a0) + add a0, a0, a2 vnclipu.wi v8, v12, 6 .ifc \type,avg - vle8.v v16, (a5) + vle8.v v16, (a0) vaaddu.vv v8, v8, v16 .endif - vse8.v v8, (a5) + vse8.v v8, (a0) + add a0, a0, a2 + .endif blt t1, a3, 3b j 8f 4: @@ -186,87 +205,103 @@ func h264_\type\()_chroma_mc8_rvv, zve32x li a4, 0 li t2, 0 addi t0, t3, 1 + .ifc \unroll,1 slli t1, a2, 2 + .else + slli t1, a2, 1 + .endif 5: # if ((x8 - xy) != 0 && (y8 -xy) == 0) add a5, a1, a4 vsetvli zero, t0, e8, m1, ta, ma + .ifc \unroll,1 addi t2, t2, 4 + .else + addi t2, t2, 2 + .endif vle8.v v8, (a5) add a5, a5, a2 vslide1down.vx v9, v8, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v10, v8, a6 vwmaccu.vx v10, a7, v9 vsetvli zero, t0, e8, m1, ta, ma vle8.v v8, (a5) add a5, a5, a2 vslide1down.vx v9, v8, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v12, v8, a6 vwmaccu.vx v12, a7, v9 + vnclipu.wi v16, v10, 6 + .ifc \type,avg + vle8.v v18, (a0) + vaaddu.vv v16, v16, v18 + .endif + vse8.v v16, (a0) + add a0, a0, a2 + vnclipu.wi v10, v12, 6 + .ifc \type,avg + vle8.v v18, (a0) + vaaddu.vv v10, v10, v18 + .endif + add a4, a4, t1 + vse8.v v10, (a0) + add a0, a0, a2 + .ifc \unroll,1 vsetvli zero, t0, e8, m1, ta, ma vle8.v v8, (a5) add a5, a5, a2 vslide1down.vx v9, v8, t5 - vsetivli zero, 8, e8, m1, ta, ma + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v14, v8, a6 vwmaccu.vx v14, a7, v9 vsetvli zero, t0, e8, m1, ta, ma vle8.v v8, (a5) - add a5, a0, a4 - add a4, a4, t1 vslide1down.vx v9, v8, t5 - vsetivli zero, 8, e8, m1, ta, ma - vnclipu.wi v16, v10, 6 - .ifc \type,avg - vle8.v v18, (a5) - vaaddu.vv v16, v16, v18 - .endif - vse8.v v16, (a5) - add a5, a5, a2 - vnclipu.wi v10, v12, 6 + vsetivli zero, \width, e8, m1, ta, ma vwmulu.vx v12, v8, a6 - .ifc \type,avg - vle8.v v18, (a5) - vaaddu.vv v10, v10, v18 - .endif - vse8.v v10, (a5) - add a5, a5, a2 vnclipu.wi v8, v14, 6 vwmaccu.vx v12, a7, v9 .ifc \type,avg - vle8.v v18, (a5) + vle8.v v18, (a0) vaaddu.vv v8, v8, v18 .endif - vse8.v v8, (a5) - add a5, a5, a2 + vse8.v v8, (a0) + add a0, a0, a2 vnclipu.wi v8, v12, 6 .ifc \type,avg - vle8.v v18, (a5) + vle8.v v18, (a0) vaaddu.vv v8, v8, v18 .endif - vse8.v v8, (a5) + vse8.v v8, (a0) + add a0, a0, a2 + .endif blt t2, a3, 5b j 8f 6: blez a3, 8f li a4, 0 li t2, 0 + .ifc \unroll,1 slli a7, a2, 2 + .else + slli a7, a2, 1 + .endif 7: # the final else, none of the above conditions are met add t0, a1, a4 vsetvli zero, zero, e8, m1, ta, ma add a5, a0, a4 add a4, a4, a7 + .ifc \unroll,1 addi t2, t2, 4 + .else + addi t2, t2, 2 + .endif vle8.v v8, (t0) add t0, t0, a2 add t1, t0, a2 vwmulu.vx v10, v8, a6 vle8.v v8, (t0) add t0, t1, a2 - vle8.v v9, (t1) - vle8.v v12, (t0) vnclipu.wi v13, v10, 6 vwmulu.vx v10, v8, a6 .ifc \type,avg @@ -276,13 +311,16 @@ func h264_\type\()_chroma_mc8_rvv, zve32x vse8.v v13, (a5) add a5, a5, a2 vnclipu.wi v8, v10, 6 - vwmulu.vx v10, v9, a6 .ifc \type,avg vle8.v v18, (a5) vaaddu.vv v8, v8, v18 .endif vse8.v v8, (a5) add a5, a5, a2 + .ifc \unroll,1 + vle8.v v9, (t1) + vle8.v v12, (t0) + vwmulu.vx v10, v9, a6 vnclipu.wi v8, v10, 6 vwmulu.vx v10, v12, a6 .ifc \type,avg @@ -297,11 +335,29 @@ func h264_\type\()_chroma_mc8_rvv, zve32x vaaddu.vv v8, v8, v18 .endif vse8.v v8, (a5) + .endif blt t2, a3, 7b 8: ret +.endm + +.macro h264_chroma_mc type width +func h264_\type\()_chroma_mc\width\()_rvv, zve32x + .ifc \width,8 + do_chroma_mc \type 8 1 + .else + li a7, 3 + blt a3, a7, 11f + do_chroma_mc \type \width 1 +11: + do_chroma_mc \type \width 0 + .endif endfunc .endm -h264_chroma_mc8 put -h264_chroma_mc8 avg +h264_chroma_mc put 8 +h264_chroma_mc avg 8 +h264_chroma_mc put 4 +h264_chroma_mc avg 4 +h264_chroma_mc put 2 +h264_chroma_mc avg 2