From patchwork Thu Apr 1 10:00:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 26678 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 83CED449135 for ; Thu, 1 Apr 2021 13:06:31 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5142D689E05; Thu, 1 Apr 2021 13:06:31 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 076126802C0 for ; Thu, 1 Apr 2021 13:06:24 +0300 (EEST) Received: by mail-pj1-f74.google.com with SMTP id e15so3010455pjg.6 for ; Thu, 01 Apr 2021 03:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=ejwcvvfqJRfXfKQ9/GHdzQk+KY/EKuiW/oMqVDxMEoI=; b=AJWAPBlD+D0oE/ZIz5cxLk3t5MPCD5VAXXuaGk6KBTdiH5GifzjyfmBA0DZQUDWTRe oeGv/UYPI3neGsSIxoaUYMXCWpq7HLF6vDYFb3Wov86JPiAVoUc2GqIwInvgRxnxCBAx NXdYR9hlULpuTJVYRDXr0AzMUYKckGbsObFIoG9Fm1JYtTvneiYBTbmPTm4aMO8GX45P jHTqGEfM/dnBjr9NAHf5OTdT0Nq+eVzZNH0sJtWodQdq7gOEj6l45DkBvRf3oen3HGFR ZzOJqudldAHnObttHhcTtnqF1qyLARB39/mFTeCuEBiT9DZ5rHTkqHBHfqoWNCbTD0My vhWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=ejwcvvfqJRfXfKQ9/GHdzQk+KY/EKuiW/oMqVDxMEoI=; b=rIITk5foIW+uvJ83HVIlfhIXgc5UxUUl3kamNL+jPOnYoMyNKUuyZs1bfHL7miO37L IFZDy3ZA4unx5j5c7ilmD4/RkM2hII7/krVtP7H3OHREKzH1rKDJDHwnbDX8r/E1iunJ gbbrSryJ7gsYL9LMUoscBCSebs9HoNy3kz6vOHhkdbhmcwv9zMtDEhkqlIbuPGItxQgQ Jsv3umHuvMbHCc9H8z7yTxo6Hf5g71MXCxo3dvAt1kxQ8QgOIxKwrz7NND8SrqnzipO8 ivLIMeWKrRH6aDtT7g1zPOZ9EBG3xwNbxe8CJV2lnWwXnUF8VfnHQ+wgk1whxl9FkWwR nm3A== X-Gm-Message-State: AOAM531TIw7QUyx74gPhKmSWsbEkD4k4wQNsjtk/GqoLxF9sfIPol5Aq whxOmG4fldcM7BH6gKHu+NsZRTZK0+yYUSZ6PgE9eM2eiOt3HgnFJ8Ltdzrk2Dg5aIxE+luY3N3 qGf42YLlI7s5EKa9OdDH0//sS2yrlc1LCxfUW1tA1z9HKtgwwCyRYQpbEF0hL36SDVTlwSZs= X-Google-Smtp-Source: ABdhPJzA89PhtQyjUj/rJi6FpPlfoHT7MqxAwWmgvDEFgV+kJ26JCQHBH9XAjptrEF6DxNARg7cA9XxM7V7W4ow= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:42:205:f5f7:ddab:dba1:8bb5]) (user=alankelly job=sendgmr) by 2002:ad4:4aa8:: with SMTP id i8mr7504973qvx.22.1617271227895; Thu, 01 Apr 2021 03:00:27 -0700 (PDT) Date: Thu, 1 Apr 2021 12:00:15 +0200 Message-Id: <20210401100017.2863838-1-alankelly@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 1/3] libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- This is so that inputs of size 8 are supported, as was the case with the original implementation. A bug was found with inputs not divisible by 16. libswscale/x86/yuv2yuvX.asm | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/yuv2yuvX.asm b/libswscale/x86/yuv2yuvX.asm index 521880dabe..b6294cb919 100644 --- a/libswscale/x86/yuv2yuvX.asm +++ b/libswscale/x86/yuv2yuvX.asm @@ -37,8 +37,10 @@ SECTION .text cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset %if notcpuflag(sse3) %define movr mova +%define unroll 1 %else %define movr movdqu +%define unroll 2 %endif movsxdifnidn dstWq, dstWd movsxdifnidn offsetq, offsetd @@ -70,8 +72,10 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset .outerloop: mova m4, m7 mova m3, m7 +%if cpuflag(sse3) mova m6, m7 mova m1, m7 +%endif .loop: %if cpuflag(avx2) vpbroadcastq m0, [filterSizeq + 8] @@ -84,28 +88,36 @@ cglobal yuv2yuvX, 7, 7, 8, filter, filterSize, src, dest, dstW, dither, offset pmulhw m5, m0, [srcq + offsetq * 2 + mmsize] paddw m3, m3, m2 paddw m4, m4, m5 +%if cpuflag(sse3) pmulhw m2, m0, [srcq + offsetq * 2 + 2 * mmsize] pmulhw m5, m0, [srcq + offsetq * 2 + 3 * mmsize] paddw m6, m6, m2 paddw m1, m1, m5 +%endif add filterSizeq, $10 mov srcq, [filterSizeq] test srcq, srcq jnz .loop psraw m3, m3, 3 psraw m4, m4, 3 +%if cpuflag(sse3) psraw m6, m6, 3 psraw m1, m1, 3 +%endif packuswb m3, m3, m4 +%if cpuflag(sse3) packuswb m6, m6, m1 +%endif mov srcq, [filterq] %if cpuflag(avx2) vpermq m3, m3, 216 vpermq m6, m6, 216 %endif movr [destq + offsetq], m3 +%if cpuflag(sse3) movr [destq + offsetq + mmsize], m6 - add offsetq, mmsize * 2 +%endif + add offsetq, mmsize * unroll mov filterSizeq, filterq cmp offsetq, dstWq jb .outerloop From patchwork Thu Apr 1 10:00:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 26676 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id 7A32A44B58F for ; Thu, 1 Apr 2021 13:00:42 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 43D62689E05; Thu, 1 Apr 2021 13:00:42 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7616D687FEC for ; Thu, 1 Apr 2021 13:00:35 +0300 (EEST) Received: by mail-wm1-f74.google.com with SMTP id g187so770162wme.3 for ; Thu, 01 Apr 2021 03:00:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=A1LGUDgMNNUbzjURbvqRfBDKlJWk1vgbNhX3/M5c/EI=; b=GR1gsCKJlJDqyglU8l8MMrSDCdGBFX3xZCHzbXq3WHe46gQNG2D6WU3BVziyxmrXfn PSjMu5lnvLZFQHLo7PYcyptq9c+hrKL/yXiNLSAkDbkYyZN5KoZ75tm6UEQmOO1vkWnO he36hNsePls3QcNP02iCkuTwvJ4pduyZ5heTaBTcAKVzvfaXx6Hj1l9dlVjzVgEKj7i2 FtXkn2CMkC8fojvpeXu4wUqmP5/JzXxTcsJtSK3EXBIYbBxxC0K+XXZUGDNYFidv9aiz YuoxwkujLStMFTnbfsslV8udVdwEklq0WwC3UwITnq8uVQaw701UQ9Q2Kc9QRb7/sPpj ZUzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=A1LGUDgMNNUbzjURbvqRfBDKlJWk1vgbNhX3/M5c/EI=; b=J6g2+18GIzVoDHjb3xyV8+DJsrBcecoo3eKxl+lwrrsfVWG5GiHfVQObS2oU3IOxwf GbLxG+RoaizITkA8umao7NPmB8czZA4Pv5PKwMqZvxZo0Cx4MZBrcub1Jo39p7/WLwn7 eq0tdSykLwiL3iw4UkJS5P0bfZNCpN8xufqFgDPvf2qLwxawOJhSU68T9x98cFTwNgh3 ZOE/vBSy3TKPUHGVHfD4v0GrvsCi+KciF7mWNKdjFECyH6lMHXjsaj78ctJY9TgAD00x 7cwNf0cf/Pd+69kvGqG0m2y/JhDs35rQ/e2o307NmLStA0zCWPc5Uqjd0MTyILnSu6lI 607w== X-Gm-Message-State: AOAM5336E7d7Cj7nUsq4Dc9mFQcpGbWbz3tA6FwKvn/1eYlnxhx39RFV MqW5ifUQ/8iWDRGqXG3lwqGJSqdkYc+QMEqvUWyGZ68+Jv/iTuBHAQbAi7S7wYBaG3SpqAVCuvS Oh2Xny6Le58ps+cE7/1hRmgtxcf8lc8mn3iPGRSMk+hKvvKR7+0i9JCS4oRQc5DzS4T3xJyg= X-Google-Smtp-Source: ABdhPJzou7TOGlRWEto0YQZvFEUW9zzw41kwh+6CiEgxNazQUigKagI92TjSDzZzzBnyDvo3spUpN86Clxa74j4= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:42:205:f5f7:ddab:dba1:8bb5]) (user=alankelly job=sendgmr) by 2002:a05:6000:1549:: with SMTP id 9mr8749300wry.192.1617271234851; Thu, 01 Apr 2021 03:00:34 -0700 (PDT) Date: Thu, 1 Apr 2021 12:00:16 +0200 In-Reply-To: <20210401100017.2863838-1-alankelly@google.com> Message-Id: <20210401100017.2863838-2-alankelly@google.com> Mime-Version: 1.0 References: <20210401100017.2863838-1-alankelly@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 2/3] libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- libswscale/x86/swscale.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index cc9e8b0155..0848a31461 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -197,7 +197,8 @@ static void yuv2yuvX_ ##opt(const int16_t *filter, int filterSize, \ const int16_t **src, uint8_t *dest, int dstW, \ const uint8_t *dither, int offset) \ { \ - ff_yuv2yuvX_ ##opt(filter, filterSize - 1, 0, dest - offset, dstW + offset, dither, offset); \ + if(dstW > 0) \ + ff_yuv2yuvX_ ##opt(filter, filterSize - 1, 0, dest - offset, dstW + offset, dither, offset); \ return; \ } @@ -215,7 +216,8 @@ static void yuv2yuvX_ ##opt(const int16_t *filter, int filterSize, \ yuv2yuvX_mmx(filter, filterSize, src, dest, dstW, dither, offset); \ return; \ } \ - ff_yuv2yuvX_ ##opt(filter, filterSize - 1, 0, dest - offset, pixelsProcessed + offset, dither, offset); \ + if(pixelsProcessed > 0) \ + ff_yuv2yuvX_ ##opt(filter, filterSize - 1, 0, dest - offset, pixelsProcessed + offset, dither, offset); \ if(remainder > 0){ \ ff_yuv2yuvX_mmx(filter, filterSize - 1, pixelsProcessed, dest - offset, pixelsProcessed + remainder + offset, dither, offset); \ } \ From patchwork Thu Apr 1 10:00:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Kelly X-Patchwork-Id: 26677 Return-Path: X-Original-To: patchwork@ffaux-bg.ffmpeg.org Delivered-To: patchwork@ffaux-bg.ffmpeg.org Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by ffaux.localdomain (Postfix) with ESMTP id DA9C144B58F for ; Thu, 1 Apr 2021 13:00:45 +0300 (EEST) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ABB5B689F4D; Thu, 1 Apr 2021 13:00:45 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from mail-ed1-f73.google.com (mail-ed1-f73.google.com [209.85.208.73]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D8153689F22 for ; Thu, 1 Apr 2021 13:00:38 +0300 (EEST) Received: by mail-ed1-f73.google.com with SMTP id cq11so2569017edb.14 for ; Thu, 01 Apr 2021 03:00:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LeoBudEiYvXHEaa5k3461RDjmj2L0MF+9cQskP2A/78=; b=LJi/PaWN55kDDYSafCsqeHg/sQyjQSvy2E8waU3U4dFVOtPM4/bU76YxETyUP+PlGQ miKsra3QXcYyfhKM2wYA2of/FOYN5yRvBAnWUX5hy0xvfd8Euw+ClUbN4ByAzITpN7RR 2LzHtkPhuzOPipdeDTNfdET8SqN1q/IPzNG3AWACccEHfRfEBmi9g4A50Vxy+RxIotaB GhROfeTU7foXRLenYgJqt/rMtf4lznfl6f3eKuuJNOtInFgk3iYEcLU/D2HjQsJh5mwp VCIAmZFT6tqj7yLjb6zmNU94s90+3RW7BKjG3I2Jj8NJ91tq0XhgbxBjpzs30ZehOpiQ rCTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LeoBudEiYvXHEaa5k3461RDjmj2L0MF+9cQskP2A/78=; b=NkiEELS771bMyYMemOFkU5xXsi9Eek9qao7eNlBvrCUnDSYi+0xyaVLBKpeimYxHmQ CZ6+DaGSSIQU3j77hMVzFQYbmViL+IZrg/5KxzR0tv6qVKJZihGiGHujBAKYwlwdSLBU Z/RSwT0vvxgkgy5CDBr4NLW/e9Rb2Ga2nLykf60s0QSH07YRlP75pLQPEdSmJzmiHXex F7s86AZ7/wUPBwdF6s1W5VJ8PGxhwsCYRT4BaPH/2NulwflUzCchWU1/u4mkE8DwBDjI EGpIc8bkmhbVjkVS200xPYcV/fEDQzS4F2ULPedzpMYXGcQ80eeDnkGgrfappKhJbQJ4 mupw== X-Gm-Message-State: AOAM530DmUD0lSL96Q2w6GA+2PUBTDBU2hYtD0IbIZLxiR/4Tif3j7ij Vm3R+jofVMQL89ZZAOrK6kBXPjJuodB5fEdOQV84nN5iK57tEE70BzA5ohPHWg8DcUUI+nT2ZBe d091XZ0782nW2JbIXW9T2XMIO7BvdVkF1P5WMjn81nyFUjpCNFRscA0ImWrTuXspzBb0HzkQ= X-Google-Smtp-Source: ABdhPJxCCp4op+B4araD2UPzLX0siQlxRFtZPzTsxx/4Q5yYxTnUCpt4tULDmCweah9H6yLYu1wf9KOFykP8Gjg= X-Received: from alankelly0.zrh.corp.google.com ([2a00:79e0:42:205:f5f7:ddab:dba1:8bb5]) (user=alankelly job=sendgmr) by 2002:a17:906:c9d0:: with SMTP id hk16mr8329019ejb.512.1617271238150; Thu, 01 Apr 2021 03:00:38 -0700 (PDT) Date: Thu, 1 Apr 2021 12:00:17 +0200 In-Reply-To: <20210401100017.2863838-1-alankelly@google.com> Message-Id: <20210401100017.2863838-3-alankelly@google.com> Mime-Version: 1.0 References: <20210401100017.2863838-1-alankelly@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog From: Alan Kelly To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH 3/3] tests/checkasm/sw_scale: adds additional tests sizes for yux2yuvX X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Alan Kelly Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" --- tests/checkasm/sw_scale.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index a10118704b..3ac0f9082f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -68,8 +68,8 @@ static void check_yuv2yuvX(void) #define FILTER_SIZES 4 static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16}; #define LARGEST_INPUT_SIZE 512 -#define INPUT_SIZES 4 - static const int input_sizes[INPUT_SIZES] = {128, 144, 256, 512}; +#define INPUT_SIZES 6 + static const int input_sizes[INPUT_SIZES] = {8, 24, 128, 144, 256, 512}; declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, @@ -107,7 +107,7 @@ static void check_yuv2yuvX(void) for(j = 0; j < 4; ++j) vFilterData[i].coeff[j + 4] = filter_coeff[i]; } - if (check_func(ctx->yuv2planeX, "yuv2yuvX_%d_%d", filter_sizes[fsi], osi)){ + if (check_func(ctx->yuv2planeX, "yuv2yuvX_%d_%d_%d", filter_sizes[fsi], osi, dstW)){ memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0]));