From patchwork Wed May 4 12:57:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sam Blackriver X-Patchwork-Id: 35586 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:a885:b0:7f:4be2:bd17 with SMTP id ca5csp423826pzb; Wed, 4 May 2022 05:57:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxYpaw9S4AGUGFVr5ozrTt/fQmOzIfqN31Q5pWffHvawJQQlOBtCpKsnxZGSEq7Wo5AWgFn X-Received: by 2002:a17:906:699:b0:6f3:a7a3:d3 with SMTP id u25-20020a170906069900b006f3a7a300d3mr20962154ejb.650.1651669056564; Wed, 04 May 2022 05:57:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651669056; cv=none; d=google.com; s=arc-20160816; b=McjDduEmAWIVXdTbN9mqbG/RuVmOoXMNbQa2CfUsnxiM5c9lVOpLEtCHDMnnEjFPT6 Ib1AJER8/wKLxqaN/sVONbp6Qp1UeqYh+HmfdxbjmvkdQCBwbgp+8QJh3W5mGo3Kyf7y m97274ONwfr0ML4MGWCscPkeZfTkga208aTb6PkjJB8bDF6iZf1YqxLZiqFz39W2IYhZ 3G25FMWBEaDyJ7CJzCCNWLNRekXysYHYiwgGF20yOJoM3E2OolsGVq/DEqmYO4usFPJQ WUBhlrBF6M0AKqYR+3FJLYW0OxlIr+OhQ0NCv+JQWZ6UYMHrzGYG6sS6JxZhm/x5C4/r hoRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:message-id:date:to:from :dkim-signature:delivered-to; bh=PUA9HosFiJ1eU05yg89rBzS7SG40Hv0SXf6+Gy97xdo=; b=gnfaMZ9SXNf38oC22ZPAdXAYWuY7RqGMW+FkuaLcUxzt5BmKY7XTSaqJH3AcxOveKw Dq7Qkc7FZ0WMAMaxBLzjxMBkfo/2n7duEasLuJ2GowMNwtAwfPhJ5/M8HeApcjCzBnmc qwNHTUKiCN8Pq12J58WksIOEwp1w5NJMTXViS9yLIsNtw8uEAjOfN6nhNdeoCiYINkre O9ZuPAOnOc7SgN9SLeMWfDo2QfSNu5EJktGldJDd1YCvamlDLCJrE/LUyWt1ISPti+rF VspZud/bnosIW3976DJ9l3a6GqxLu8QjcDIx7TSwwIvooJBKgWKumRtLhdbgfZ/7nTx/ L2Bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=gYZoAZeO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id gs12-20020a1709072d0c00b006df76385e7asi22692257ejc.794.2022.05.04.05.57.35; Wed, 04 May 2022 05:57:36 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@gmail.com header.s=20210112 header.b=gYZoAZeO; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 9D09768AF97; Wed, 4 May 2022 15:57:32 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2A0DB68AF97 for ; Wed, 4 May 2022 15:57:26 +0300 (EEST) Received: by mail-lj1-f173.google.com with SMTP id m23so1577347ljc.0 for ; Wed, 04 May 2022 05:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kv+jJIejYfp28+AawLt/tVqycVj1C5KPoNidepLrp20=; b=gYZoAZeOChYofIamOquXG4UekQRROYKIgMz98QDittydKTidOFJTodDSW6BDdT1a/H dkBBlz7lAPLqoeOQn3Rly/f022b/DIZVgRwflyPIKkQ2Yvv/74t3WLip1xpdX9M3jWd/ g8pQ2ToFwncMzwDPWUJj1rCpCd6cP2MFBXPJUowmHT8Evo41QokBPxZUadut4/bBUQdO WAdr9pxDmK4GOBWy7hDwSMu/0lX5+IzLztuPf/xzClFQaoEIRvW3Y6+rSxlQv5y+r20O /lzyr97IWzweXqZNLgUWzhd92wk2S8B+UwXBEL5ni8r/hSlq8Y1LzMKgAzMJhyKjHdV5 Skug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kv+jJIejYfp28+AawLt/tVqycVj1C5KPoNidepLrp20=; b=ZRWStFYz5UnOGE9nlHJD0gu8JVuqu4tG+odOsPIKbRoAbLbiDz1A+zq6T4dTV41dsw 18rQ9a9OAxsxvyW8GDMS0gkU+aa88xLK4AHkqkZpa5FwXkxTCutWV5Vp0n8EOm+BEmbk 2P1hTLkk/AnvTIlPD0hNgHdxaC0ug9u6oS232rpuCKNmiXrL/ei4+qAmsiIBZugY/uAl p3N9Eo6OsHN3g5ARK9KbCjBZokg5pCja3dr+/ikKQ13v13nNXqKTZKUGIupuIr8xCHOS dtw6lgzW2LgQbbWOU5q4Ers9+Q1f9NYj+BTRhYy4aEZ5sMPpur+3L6Ims2bjjERYRDMU hiFQ== X-Gm-Message-State: AOAM530jj0NLt+rVevbjiRjD9f3ZnIKucrKI0sBDpbza2OCmAIk5GCsl 16Nn0ycnbP59u9GrkHkpHAVuncd76V+x9cVG X-Received: by 2002:a2e:9e43:0:b0:24b:3c6:3832 with SMTP id g3-20020a2e9e43000000b0024b03c63832mr12118184ljk.63.1651669045022; Wed, 04 May 2022 05:57:25 -0700 (PDT) Received: from localhost.localdomain ([84.237.54.33]) by smtp.gmail.com with ESMTPSA id 7-20020a05651c128700b0024f3d1daea4sm1659922ljc.44.2022.05.04.05.57.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 05:57:24 -0700 (PDT) From: FacelessLake X-Google-Original-From: FacelessLake To: ffmpeg-devel@ffmpeg.org Date: Wed, 4 May 2022 19:57:05 +0700 Message-Id: <20220504125705.2387-1-sinonim147@gmail.com> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] avcodec/vp9: ipred_vl_16x16_16 avx2 implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Semen Belozerov Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 8qXk6/dnab7D From: Semen Belozerov --- libavcodec/x86/vp9dsp_init_16bpp.c | 2 ++ libavcodec/x86/vp9intrapred_16bpp.asm | 51 +++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c b/libavcodec/x86/vp9dsp_init_16bpp.c index 27e746aea1..b17826326f 100644 --- a/libavcodec/x86/vp9dsp_init_16bpp.c +++ b/libavcodec/x86/vp9dsp_init_16bpp.c @@ -54,6 +54,7 @@ decl_ipred_fn(dl, 16, 16, avx2); decl_ipred_fn(dl, 32, 16, avx2); decl_ipred_fn(dr, 16, 16, avx2); decl_ipred_fn(dr, 32, 16, avx2); +decl_ipred_fn(vl, 16, 16, avx2); #define decl_ipred_dir_funcs(type) \ decl_ipred_fns(type, 16, sse2, sse2); \ @@ -139,6 +140,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp) init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2); init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2); init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2); + init_ipred_func(vl, VERT_LEFT, 16, 16, avx2); #if ARCH_X86_64 init_ipred_func(dr, DIAG_DOWN_RIGHT, 32, 16, avx2); #endif diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm b/libavcodec/x86/vp9intrapred_16bpp.asm index 32b698243a..0dad91ac5c 100644 --- a/libavcodec/x86/vp9intrapred_16bpp.asm +++ b/libavcodec/x86/vp9intrapred_16bpp.asm @@ -1222,6 +1222,57 @@ cglobal vp9_ipred_dr_16x16_16, 4, 5, 6, dst, stride, l, a mova [dst3q+strideq*4], m5 ; 7 RET +cglobal vp9_ipred_vl_16x16_16, 4, 5, 7, dst, stride, l, a + movifnidn aq, amp + mova m0, [aq] ; abcdefghijklmnop + vpbroadcastw xm1, [aq+30] ; pppppppp + vperm2i128 m2, m0, m1, q0201 ; ijklmnoppppppppp + vpalignr m3, m2, m0, 2 ; bcdefghijklmnopp + vperm2i128 m4, m3, m1, q0201 ; jklmnopppppppppp + vpalignr m5, m2, m0, 4 ; cdefghijklmnoppp + vperm2i128 m6, m5, m1, q0201 ; klmnoppppppppppp + LOWPASS 5, 3, 0 ; BCDEFGHIJKLMNOPP + LOWPASS 6, 4, 2 ; JKLMNOPPPPPPPPPP + pavgw m3, m0 ; abcdefghijklmnop + pavgw m4, m2 ; ijklmnoppppppppp + DEFINE_ARGS dst, stride, stride3, stride5, dst4 + lea dst4q, [dstq+strideq*4] + lea stride3q, [strideq*3] + lea stride5q, [stride3q+strideq*2] + + mova [dstq+strideq*0], m3 ; 0 abcdefghijklmnop + mova [dstq+strideq*1], m5 ; 1 BCDEFGHIJKLMNOPP + vpalignr m0, m4, m3, 2 + vpalignr m1, m6, m5, 2 + mova [dstq+strideq*2 ], m0 ; 2 bcdefghijklmnopp + mova [dstq+stride3q*1], m1 ; 3 CDEFGHIJKLMNOPPP + vpalignr m0, m4, m3, 4 + vpalignr m1, m6, m5, 4 + mova [dst4q+strideq*0], m0 ; 4 cdefghijklmnoppp + mova [dstq+stride5q*1], m1 ; 5 DEFGHIJKLMNOPPPP + vpalignr m0, m4, m3, 6 + vpalignr m1, m6, m5, 6 + mova [ dstq+stride3q*2], m0 ; 6 defghijklmnopppp + mova [dst4q+stride3q*1], m1 ; 7 EFGHIJKLMNOPPPPP + vpalignr m0, m4, m3, 8 + vpalignr m1, m6, m5, 8 + mova [ dstq+strideq*8], m0 ; 8 efghijklmnoppppp + mova [dst4q+stride5q*1], m1 ; 9 FGHIJKLMNOPPPPPP + vpalignr m0, m4, m3, 10 + mova [dstq+stride5q*2], m0 ; 10 fghijklmnopppppp + vpalignr m0, m4, m3, 12 + mova [dst4q+strideq*8], m0 ; 12 ghijklmnoppppppp + vpalignr m0, m4, m3, 14 + mova [dst4q+stride5q*2], m0 ; 14 hijklmnopppppppp + sub dst4q, strideq + vpalignr m1, m6, m5, 10 + mova [dst4q+strideq*8], m1 ; 11 GHIJKLMNOPPPPPPP + vpalignr m1, m6, m5, 12 + mova [dst4q+stride5q*2], m1 ; 13 HIJKLMNOPPPPPPPP + vpalignr m1, m6, m5, 14 + mova [dst4q+stride3q*4], m1 ; 15 IJKLMNOPPPPPPPPP + RET + %if ARCH_X86_64 cglobal vp9_ipred_dr_32x32_16, 4, 7, 10, dst, stride, l, a mova m0, [lq+mmsize*0+0] ; l[0-15]