From patchwork Thu May 26 02:01:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 35934 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:6914:b0:82:6b11:2509 with SMTP id q20csp149007pzj; Wed, 25 May 2022 19:02:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxtu5ibME/RZN6P8XjVFCdwoyhW+/nG97K8tjZ2r2vOcfcSWHMT95sBo6CvcbSpFfZ7fda X-Received: by 2002:a05:6402:1f03:b0:42b:38ed:a9ff with SMTP id b3-20020a0564021f0300b0042b38eda9ffmr15390553edb.218.1653530543930; Wed, 25 May 2022 19:02:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653530543; cv=none; d=google.com; s=arc-20160816; b=0bJzwn/xzpn0Og6Zn9B7Y3TrMxJYBEvodOOwxRS2H/EiuSIKJvSHBf8FVPfuF+PS8v LMz+/c4VjjJ2OwhuuVKbrS8YJz55+qI9v03dQtwZeBoDSWnZghgvO31s6/RF7SqqgBjz nua8ZmhhGJpG1xrljFrapWnlUyepau+zEBvZEEgQakQnLXZmmHgzLrBDQ6CTFIXY/RmO LyQBa9hdNOQtiKn7pbritiD8Oz4RsprCuoM9tc1fZUCvbcOB6L6qlKUtRELrl/dmTn04 MPv35VNCC+7ApEMJLzWx9ayhmphsJUZqXkhkQ/0WbWxXdMq2hL2fLYdflq3Zi/FFX8yf jmmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:message-id:date:thread-index:thread-topic:to:from :dkim-signature:delivered-to; bh=29gkdvv5jg8boqHuvj/ZcYnhO9ETEOzWRa5WYwO3XwQ=; b=H05qxXWYd6s922lXiywKm6GTYOVJ/dlZIBK1VcMfrcn0uhzv3KPULKjUXB1wIpgHeS kIs8XXZjqERAoxSDdC5LcWtEkMbSEHKIVAIKbIdCkpFBbbV5SYCqvbIEG1WTwvu5jZQe UrR6ukFMrFevrdM8PuaXwxIVPfuRhnlCjxAvOeOpWYrS7rfOp6wVJm9BPTyX6FjPzngH nVdwy6ZVgYLuc8aHgo8XNdn5N4zJWHOgu1uMgIbUeFRxTSfpUDpJCu2NJFb/ctSfJdGM HKeWcLFbfcW6LFDA8SBKaRhgt2xmLah6ZbxTAymblhrVb+2uqNFu6xSObk2QmwNY9282 ahWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=bIdZ7r2m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id h2-20020a50ed82000000b00425cd9dfd48si254752edr.171.2022.05.25.19.02.23; Wed, 25 May 2022 19:02:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=bIdZ7r2m; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1F0C168B5B7; Thu, 26 May 2022 05:02:10 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8728668B5A1 for ; Thu, 26 May 2022 05:02:03 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1653530529; x=1685066529; h=from:to:cc:subject:date:message-id: content-transfer-encoding:mime-version; bh=PKuTUe/UM7LEZtbAuwTPWob2ZiE6l/Rct0HdXB4Z8LE=; b=bIdZ7r2mCTJ+GYeTB/Z0+cM7ZgizJDYvwCUQS9Bunte2RWZpNoJzLo11 5mV+qbP2VwErvYreq4kQ+f0ax4SLqq5yZMw1Mj5HwBQ5xcp1JOiX9oUZu ePkMdYe8fU+D5PA+yfgXuKGmk3iKLg5X0kBUHV4epPkMMBI8W45R9x2Ip E=; X-IronPort-AV: E=Sophos;i="5.91,252,1647302400"; d="scan'208";a="197779211" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-pdx-2a-e6c05252.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP; 26 May 2022 02:02:00 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2a-e6c05252.us-west-2.amazon.com (Postfix) with ESMTPS id 36AD8425F9; Thu, 26 May 2022 02:01:59 +0000 (UTC) Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Thu, 26 May 2022 02:01:58 +0000 Received: from EX13D07UWB004.ant.amazon.com (10.43.161.196) by EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Thu, 26 May 2022 02:01:58 +0000 Received: from EX13D07UWB004.ant.amazon.com ([10.43.161.196]) by EX13D07UWB004.ant.amazon.com ([10.43.161.196]) with mapi id 15.00.1497.036; Thu, 26 May 2022 02:01:58 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH v3 1/2] checkasm: added additional dstW tests for hscale Thread-Index: AdhwpCupqH0gt5OzQNKTOvw2o3874w== Date: Thu, 26 May 2022 02:01:58 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.160.132] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/2] checkasm: added additional dstW tests for hscale X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?q?Martin_Storsj=C3=B6?= , "Pop, Sebastian" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: b/TxEItHCAOP Signed-off-by: Jonathan Swinney --- tests/checkasm/sw_scale.c | 104 ++++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 49 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 3c0a083b42..31d9a525e9 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -148,7 +148,11 @@ static void check_hscale(void) { 8, 18 }, }; - int i, j, fsi, hpi, width; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 + static const int input_sizes[INPUT_SIZES] = {8, 24, 128, 144, 256, 512}; + + int i, j, fsi, hpi, width, dstWi; struct SwsContext *ctx; // padded @@ -178,57 +182,59 @@ static void check_hscale(void) for (hpi = 0; hpi < HSCALE_PAIRS; hpi++) { for (fsi = 0; fsi < FILTER_SIZES; fsi++) { - width = filter_sizes[fsi]; - - ctx->srcBpc = hscale_pairs[hpi][0]; - ctx->dstBpc = hscale_pairs[hpi][1]; - ctx->hLumFilterSize = ctx->hChrFilterSize = width; - ctx->dstW = ctx->chrDstW = SRC_PIXELS; - - for (i = 0; i < SRC_PIXELS; i++) { - filterPos[i] = i; - filterPosAvx[i] = i; - - // These filter cofficients are chosen to try break two corner - // cases, namely: - // - // - Negative filter coefficients. The filters output signed - // values, and it should be possible to end up with negative - // output values. - // - // - Positive clipping. The hscale filter function has clipping - // at (1<<15) - 1 - // - // The coefficients sum to the 1.0 point for the hscale - // functions (1 << 14). - - for (j = 0; j < width; j++) { - filter[i * width + j] = -((1 << 14) / (width - 1)); + for (dstWi = 0; dstWi < INPUT_SIZES; dstWi++) { + width = filter_sizes[fsi]; + + ctx->srcBpc = hscale_pairs[hpi][0]; + ctx->dstBpc = hscale_pairs[hpi][1]; + ctx->hLumFilterSize = ctx->hChrFilterSize = width; + + for (i = 0; i < SRC_PIXELS; i++) { + filterPos[i] = i; + filterPosAvx[i] = i; + + // These filter cofficients are chosen to try break two corner + // cases, namely: + // + // - Negative filter coefficients. The filters output signed + // values, and it should be possible to end up with negative + // output values. + // + // - Positive clipping. The hscale filter function has clipping + // at (1<<15) - 1 + // + // The coefficients sum to the 1.0 point for the hscale + // functions (1 << 14). + + for (j = 0; j < width; j++) { + filter[i * width + j] = -((1 << 14) / (width - 1)); + } + filter[i * width + (rnd() % width)] = ((1 << 15) - 1); } - filter[i * width + (rnd() % width)] = ((1 << 15) - 1); - } - for (i = 0; i < MAX_FILTER_WIDTH; i++) { - // These values should be unused in SIMD implementations but - // may still be read, random coefficients here should help show - // issues where they are used in error. + for (i = 0; i < MAX_FILTER_WIDTH; i++) { + // These values should be unused in SIMD implementations but + // may still be read, random coefficients here should help show + // issues where they are used in error. - filter[SRC_PIXELS * width + i] = rnd(); - } - ff_sws_init_scale(ctx); - memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); - if ((cpu_flags & AV_CPU_FLAG_AVX2) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) - ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, SRC_PIXELS); - - if (check_func(ctx->hcScale, "hscale_%d_to_%d_width%d", ctx->srcBpc, ctx->dstBpc + 1, width)) { - memset(dst0, 0, SRC_PIXELS * sizeof(dst0[0])); - memset(dst1, 0, SRC_PIXELS * sizeof(dst1[0])); - - call_ref(NULL, dst0, SRC_PIXELS, src, filter, filterPos, width); - call_new(NULL, dst1, SRC_PIXELS, src, filterAvx2, filterPosAvx, width); - if (memcmp(dst0, dst1, SRC_PIXELS * sizeof(dst0[0]))) - fail(); - bench_new(NULL, dst0, SRC_PIXELS, src, filter, filterPosAvx, width); + filter[SRC_PIXELS * width + i] = rnd(); + } + ctx->dstW = ctx->chrDstW = input_sizes[dstWi]; + ff_sws_init_scale(ctx); + memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); + if ((cpu_flags & AV_CPU_FLAG_AVX2) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) + ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, SRC_PIXELS); + + if (check_func(ctx->hcScale, "hscale_%d_to_%d__fs_%d_dstW_%d", ctx->srcBpc, ctx->dstBpc + 1, width, ctx->dstW)) { + memset(dst0, 0, SRC_PIXELS * sizeof(dst0[0])); + memset(dst1, 0, SRC_PIXELS * sizeof(dst1[0])); + + call_ref(NULL, dst0, ctx->dstW, src, filter, filterPos, width); + call_new(NULL, dst1, ctx->dstW, src, filterAvx2, filterPosAvx, width); + if (memcmp(dst0, dst1, ctx->dstW * sizeof(dst0[0]))) + fail(); + bench_new(NULL, dst0, ctx->dstW, src, filter, filterPosAvx, width); + } } } }