From patchwork Mon Jun 13 16:36:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 36202 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:1a22:b0:84:42e0:ad30 with SMTP id cj34csp592376pzb; Mon, 13 Jun 2022 09:39:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzllgO4cIM0jVTulGwKv/d3pymn9SlCpPqhySXzEYjEHNA/iuNmUGpozQHoouFpGwhsXuHA X-Received: by 2002:a05:6402:348e:b0:42e:2e1a:817c with SMTP id v14-20020a056402348e00b0042e2e1a817cmr669498edc.23.1655138346920; Mon, 13 Jun 2022 09:39:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655138346; cv=none; d=google.com; s=arc-20160816; b=Wwq8kb9OiX9XEqrtG5vrD5SQfvGLdnuxISn1xWjNzeO2zxz4ZL9aG7m+hfK5w2qdan dEib7bcyuLbF8sBMJxBxvbkIV3ASWagWWKcn0Q+FAP6j+Bp0PlFPHCituo6bR/tKMcGj hFdfHZIpllWn4Ijf/B7KRhPvgXvPPMaCQZ2bdwz21Cl7Hpd8AdhVgzeNg31kQJVvYL2+ h9rZzYvYTPiPRR26S7+zCqrDR/UlX+z7kZWl1Mh1nQApz//zn8bHGyO+r8iWCyiNkh7C bfk7oMXATW74YwqXj/V+xkMki/7tLCIHyDEN19Niqkate2EKTCFjj1ObQqJkj0VgGhEC 7NRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:message-id:date:thread-index:thread-topic:to:from :dkim-signature:delivered-to; bh=osbJ6VjuAhp7vQqJ3LNI8E/fZ4CXemolDLN2C7G2CbE=; b=L/v5JuG05oiM2xe3rE/NJ0iOjzgX1SvEhlJ7jy2Az8nYt9x3Wcw7fZvya9MtRFssyG 1/irlC1NewO4HUlaYflwHOwSZYbyEgO5krk5kIOqHYSFQ4t6eWE6oHTnsy6TH+xYREks BKQ241rZEACzbpne/EH+iJvNtKYcNvdhDHGyUP/A95V7GI8mhxjJVmVkYC4TXHMe0dif WTh5TQJ8t7d3HmwwWiZ5MrMk2lQgF0e8aDqVXCoSfV9AmmJIczYVZP47jwZuhcJQmlez 8L1RPA46MOU9zPvk0kh5X6jySZRn3gL/4AAWqfbn6FAt6NFe5NOzvk+Qo2eWjeBG3nP6 y9Jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b="WtKzfY/Q"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id m23-20020a17090677d700b0070dff46708esi8070834ejn.137.2022.06.13.09.38.59; Mon, 13 Jun 2022 09:39:06 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b="WtKzfY/Q"; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 371B568B62E; Mon, 13 Jun 2022 19:38:57 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-9103.amazon.com (smtp-fw-9103.amazon.com [207.171.188.200]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6D09968AB35 for ; Mon, 13 Jun 2022 19:38:49 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1655138334; x=1686674334; h=from:to:cc:subject:date:message-id: content-transfer-encoding:mime-version; bh=xqZwsldyO4xOTh589dhgppxvWO9e7VWJOKnDnvZqcb0=; b=WtKzfY/QBPclZAzONmnRBv6EO9Ul7TzdjKaRv4MKUjQwcwA0FvYoIeuC iZehxU6LyI2DyH75O/erAGJZN2I+paaOVdtFO71c28jYdPEZMTW/vd13t 7JSF9Wyzcmy+nmdox5XxaG7cL+jKuugYNX5cey6NCagpIu7BsmAf4z6Jc Y=; X-IronPort-AV: E=Sophos;i="5.91,297,1647302400"; d="scan'208";a="1024041805" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2c-a264e6fe.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9103.sea19.amazon.com with ESMTP; 13 Jun 2022 16:36:25 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2c-a264e6fe.us-west-2.amazon.com (Postfix) with ESMTPS id 7915A42EEC; Mon, 13 Jun 2022 16:36:25 +0000 (UTC) Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Mon, 13 Jun 2022 16:36:24 +0000 Received: from EX13D07UWB004.ant.amazon.com (10.43.161.196) by EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Mon, 13 Jun 2022 16:36:24 +0000 Received: from EX13D07UWB004.ant.amazon.com ([10.43.161.196]) by EX13D07UWB004.ant.amazon.com ([10.43.161.196]) with mapi id 15.00.1497.036; Mon, 13 Jun 2022 16:36:24 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH 1/2] checkasm: updated tests for sw_scale Thread-Index: Adh/OimWmsWhA9bqTbu/vZlhya3PPw== Date: Mon, 13 Jun 2022 16:36:24 +0000 Message-ID: <005de8b06dea40c4a60fdad9a084138f@EX13D07UWB004.ant.amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.160.132] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 1/2] checkasm: updated tests for sw_scale X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?q?Martin_Storsj=C3=B6?= , "J. Dekker" , "Pop, Sebastian" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: 2jpBkWKT6OL9 - added a test for yuv2plane1 (currently disabled for x86_64) - fixed test for yuv2planeX for aarch64 which was previously not working at all Signed-off-by: Jonathan Swinney --- tests/checkasm/sw_scale.c | 176 +++++++++++++++++++++++++++++++++----- 1 file changed, 156 insertions(+), 20 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 31d9a525e9..537cbd3265 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -35,12 +35,13 @@ AV_WN32(buf + j, rnd()); \ } while (0) -// This reference function is the same approximate algorithm employed by the -// SIMD functions -static void ref_function(const int16_t *filter, int filterSize, - const int16_t **src, uint8_t *dest, int dstW, - const uint8_t *dither, int offset) +static void yuv2planeX_8_ref(const int16_t *filter, int filterSize, + const int16_t **src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) { +#if ARCH_X86_64 + // This reference function is the same approximate algorithm employed by the + // SIMD functions on x86. int i, d; d = ((filterSize - 1) * 8 + dither[0]) >> 4; for ( i = 0; i < dstW; i++) { @@ -56,6 +57,120 @@ static void ref_function(const int16_t *filter, int filterSize, } dest[i]= av_clip_uint8(val>>3); } +#else + // Other architectures use the default implementation as the reference. + int i; + for (i=0; i>19); + } +#endif +} +static void yuv2plane1_8_ref(const int16_t *src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) +{ + int i; + for (i=0; i> 7; + dest[i]= av_clip_uint8(val); + } +} + +static void print_data(uint8_t *p, size_t len, size_t offset) +{ + size_t i = 0; + for (; i < len; i++) { + if (i % 8 == 0) { + printf("0x%04lx: ", i+offset); + } + printf("0x%02x ", (uint32_t) p[i]); + if (i % 8 == 7) { + printf("\n"); + } + } + if (i % 8 != 0) { + printf("\n"); + } +} + +static size_t show_differences(uint8_t *a, uint8_t *b, size_t len) +{ + for (size_t i = 0; i < len; i++) { + if (a[i] != b[i]) { + size_t offset_of_mismatch = i; + size_t offset; + if (i >= 8) i-=8; + offset = i & (~7); + printf("test a:\n"); + print_data(&a[offset], 32, offset); + printf("\ntest b:\n"); + print_data(&b[offset], 32, offset); + printf("\n"); + return offset_of_mismatch; + } + } + return len; +} + +static void check_yuv2yuv1(void) +{ + struct SwsContext *ctx; + int osi, isi; + int dstW, offset; + size_t fail_offset; + const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); + #define LARGEST_INPUT_SIZE 512 + + const int offsets[] = {0, 3, 8, 11, 16, 19}; + const int OFFSET_SIZES = sizeof(offsets)/sizeof(offsets[0]); + + declare_func_emms(AV_CPU_FLAG_MMX, void, + const int16_t *src, uint8_t *dest, + int dstW, const uint8_t *dither, int offset); + + LOCAL_ALIGNED_8(int16_t, src_pixels, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst1, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dither, [8]); + + randomize_buffers((uint8_t*)dither, 8); + randomize_buffers((uint8_t*)src_pixels, LARGEST_INPUT_SIZE * sizeof(int16_t)); + ctx = sws_alloc_context(); + if (sws_init_context(ctx, NULL, NULL) < 0) + fail(); + + ff_sws_init_scale(ctx); + for(isi = 0; isi < INPUT_SIZES; ++isi){ + dstW = input_sizes[isi]; + for(osi = 0; osi < OFFSET_SIZES; osi++){ + offset = offsets[osi]; + if (check_func(ctx->yuv2plane1, "yuv2yuv1_%d_%d", offset, dstW)){ + memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); + + yuv2plane1_8_ref(src_pixels, dst0, dstW, dither, offset); + call_new(src_pixels, dst1, dstW, dither, offset); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { + fail(); + printf("failed: yuv2yuv1_%d_%d\n", offset, dstW); + fail_offset = show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + printf("failing values: src: 0x%04x dither: 0x%02x dst-c: %02x dst-asm: %02x\n", + (int) src_pixels[fail_offset], + (int) dither[(fail_offset + fail_offset) & 7], + (int) dst0[fail_offset], + (int) dst1[fail_offset]); + } + if(dstW == LARGEST_INPUT_SIZE) + bench_new(src_pixels, dst1, dstW, dither, offset); + } + } + } + sws_freeContext(ctx); } static void check_yuv2yuvX(void) @@ -64,11 +179,11 @@ static void check_yuv2yuvX(void) int fsi, osi, isi, i, j; int dstW; #define LARGEST_FILTER 16 -#define FILTER_SIZES 4 - static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16}; + const int filter_sizes[] = {1, 2, 3, 4, 8, 16}; + const int FILTER_SIZES = sizeof(filter_sizes)/sizeof(filter_sizes[0]); #define LARGEST_INPUT_SIZE 512 -#define INPUT_SIZES 6 - static const int input_sizes[INPUT_SIZES] = {8, 24, 128, 144, 256, 512}; + static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, @@ -95,7 +210,7 @@ static void check_yuv2yuvX(void) ff_sws_init_scale(ctx); for(isi = 0; isi < INPUT_SIZES; ++isi){ dstW = input_sizes[isi]; - for(osi = 0; osi < 64; osi += 16){ + for(osi = 0; osi < 1; osi += 16){ for(fsi = 0; fsi < FILTER_SIZES; ++fsi){ src = av_malloc(sizeof(int16_t*) * filter_sizes[fsi]); vFilterData = av_malloc((filter_sizes[fsi] + 2) * sizeof(union VFilterData)); @@ -110,18 +225,35 @@ static void check_yuv2yuvX(void) memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); - // The reference function is not the scalar function selected when mmx - // is deactivated as the SIMD functions do not give the same result as - // the scalar ones due to rounding. The SIMD functions are activated by - // the flag SWS_ACCURATE_RND - ref_function(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); - // There's no point in calling new for the reference function - if(ctx->use_mmx_vfilter){ - call_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); - if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) + if (ARCH_X86_64) { + // The reference function is not the scalar function selected when mmx + // is deactivated as the SIMD functions do not give the same result as + // the scalar ones due to rounding. The SIMD functions are activated by + // the flag SWS_ACCURATE_RND + yuv2planeX_8_ref(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); + // There's no point in calling new for the reference function + if(ctx->use_mmx_vfilter) { + call_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { + fail(); + printf("failed: yuv2yuvX_%d_%d_%d\n", filter_sizes[fsi], osi, dstW); + show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + } + if(dstW == LARGEST_INPUT_SIZE) + bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + } + } + + if (ARCH_AARCH64) { + yuv2planeX_8_ref(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); + call_new(&filter_coeff[0], filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) { fail(); + printf("failed: yuv2yuvX_%d_%d_%d\n", filter_sizes[fsi], osi, dstW); + show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + } if(dstW == LARGEST_INPUT_SIZE) - bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + bench_new(&filter_coeff[0], filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); } } av_freep(&src); @@ -245,6 +377,10 @@ void checkasm_check_sw_scale(void) { check_hscale(); report("hscale"); + if (!ARCH_X86_64) { + check_yuv2yuv1(); + report("yuv2yuv1"); + } check_yuv2yuvX(); report("yuv2yuvX"); }