From patchwork Sat Aug 13 20:55:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Swinney, Jonathan" X-Patchwork-Id: 37258 Delivered-To: ffmpegpatchwork2@gmail.com Received: by 2002:a05:6a20:3d0d:b0:8d:a68e:8a0e with SMTP id y13csp513514pzi; Sat, 13 Aug 2022 13:56:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR4bkMBhixNzzibwanj4HIvkG3rXEHFWkGf6nrcl5YjiyhIu7BuZ2mmZ6qC/6olXImdpWmDn X-Received: by 2002:a17:907:8a0a:b0:730:a118:75de with SMTP id sc10-20020a1709078a0a00b00730a11875demr6504914ejc.189.1660424172138; Sat, 13 Aug 2022 13:56:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660424172; cv=none; d=google.com; s=arc-20160816; b=JjoVc1FPna7O3gTvvNVH+nOqWyMJP2e7FUapQh3X4sVjSlEnlzUs5xfUyWg1tjHCaA BL7hk2xCpggAE5YKnod0jssSS0+rK9wI/sKBxCqqRI21G+T4zTOogLVqTcPII7+yWg1L esSH7uOqVsCdwCiFNbLlz6G8IrIR1EATGds4XWomZc85rSATAaw0NFIw0+XLC9EjjfNX MbSidicrWr2DaH5mKeLlnqXjwMGWWoQtffyOBWiKpJn9nNTuc6a8VmCuP4bc7sWAIrke TwnA1pa8s0sfpJZ9apEkDr/jbN+P9xA8IkP7jBVAp2PdxWrF834UWuL2AUPkUIe7/sFW gkxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:content-transfer-encoding:cc:reply-to :list-subscribe:list-help:list-post:list-archive:list-unsubscribe :list-id:precedence:subject:mime-version:content-language :accept-language:message-id:date:thread-index:thread-topic:to:from :dkim-signature:delivered-to; bh=v816ddX5bI2h2KFFkyNW1xOzFHn+V9Ap5xj3c1BvdzE=; b=0Gnn7zNdjOvMXWB7SqYZzMG/aj32WkmgtFgoxWMvqmNOe/JnNmw6f1HZ8XYNTpmXPl 49PBCNTdt5ZNveh1j4UZPuHsQ85HzT+M5axUUr8DflfWHLSOsEQWvOnif3C0fpPKuNET ymx9MzgbvdP/y+hLoPCqzEwaobweS4JGRjpB9dIUmSCshOu/8PDE5PAuacZwypYTTApX mjU19VEPwYERjsIH266Z5v9ogPzXqZlVyQX0vkUewJ5+/ofL8k/Bk806BF/5IwceCLut 9Cti8BOJcnVHiZxr8gkpuCZGZnE+6Vmr6JYXi9H8t9nHPI9c2MDaJNDVVfpwA0nMXLS4 gGaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=A2R29emr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org. [79.124.17.100]) by mx.google.com with ESMTP id i20-20020a05640242d400b0043e8006a816si5639118edc.30.2022.08.13.13.56.11; Sat, 13 Aug 2022 13:56:12 -0700 (PDT) Received-SPF: pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) client-ip=79.124.17.100; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@amazon.com header.s=amazon201209 header.b=A2R29emr; spf=pass (google.com: domain of ffmpeg-devel-bounces@ffmpeg.org designates 79.124.17.100 as permitted sender) smtp.mailfrom=ffmpeg-devel-bounces@ffmpeg.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1C38168B3B2; Sat, 13 Aug 2022 23:56:09 +0300 (EEST) X-Original-To: ffmpeg-devel@ffmpeg.org Delivered-To: ffmpeg-devel@ffmpeg.org Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 53C1E68B7DF for ; Sat, 13 Aug 2022 23:56:01 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1660424166; x=1691960166; h=from:to:cc:subject:date:message-id: content-transfer-encoding:mime-version; bh=3ev2ify0F3v8llOdECR+94S9KtZty7AYhuCvnUlLgSI=; b=A2R29emrblsW7HaS9IJb68AKz88lbx3pzpQTpP0g3J6T4OGrK+qjixyY 3ijKv4eTlW3p5wi5NL8ganjnp2xEICgefvCFTLnlTh+pw7QutvEWKwKRB vp5nXVGZB2i5u8r318AScxdmY81oNS9I/30dy1nQvdvMn59maWX89EgcK Q=; X-IronPort-AV: E=Sophos;i="5.93,236,1654560000"; d="scan'208";a="118894192" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-iad-1e-fc41acad.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Aug 2022 20:55:58 +0000 Received: from EX13MTAUWB001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1e-fc41acad.us-east-1.amazon.com (Postfix) with ESMTPS id F282AC0230; Sat, 13 Aug 2022 20:55:56 +0000 (UTC) Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.38; Sat, 13 Aug 2022 20:55:55 +0000 Received: from EX19D007UWB001.ant.amazon.com (10.13.138.75) by EX19D007UWB001.ant.amazon.com (10.13.138.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.12; Sat, 13 Aug 2022 20:55:55 +0000 Received: from EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851]) by EX19D007UWB001.ant.amazon.com ([fe80::bcaa:e18f:a569:3851%6]) with mapi id 15.02.1118.012; Sat, 13 Aug 2022 20:55:55 +0000 From: "Swinney, Jonathan" To: "ffmpeg-devel@ffmpeg.org" Thread-Topic: [PATCH v3 1/3] checkasm: updated tests for sw_scale Thread-Index: AdivVusAsr+hqURDQGaVD4fKOSqJyg== Date: Sat, 13 Aug 2022 20:55:55 +0000 Message-ID: <859182400d774ed6a80829087b578b8e@amazon.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.43.162.134] MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH v3 1/3] checkasm: updated tests for sw_scale X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: =?utf-8?q?Martin_Storsj=C3=B6?= , Hubert Mazur Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" X-TUID: WYvmlerZxkur - added a test for yuv2plane1 - fixed test for yuv2planeX for aarch64 which was previously not working at all - updated the test for yuv2planeX to check exact results or approximated results Signed-off-by: Jonathan Swinney --- libswscale/x86/swscale.c | 8 +- tests/checkasm/sw_scale.c | 188 ++++++++++++++++++++++++++++++-------- 2 files changed, 154 insertions(+), 42 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 628f12137c..32d441245d 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -534,7 +534,8 @@ switch(c->dstBpc){ \ ASSIGN_SSE_SCALE_FUNC(c->hcScale, c->hChrFilterSize, sse2, sse2); ASSIGN_VSCALEX_FUNC(c->yuv2planeX, sse2, , HAVE_ALIGNED_STACK || ARCH_X86_64); - ASSIGN_VSCALE_FUNC(c->yuv2plane1, sse2); + if (!(c->flags & SWS_ACCURATE_RND)) + ASSIGN_VSCALE_FUNC(c->yuv2plane1, sse2); switch (c->srcFormat) { case AV_PIX_FMT_YA8: @@ -583,14 +584,15 @@ switch(c->dstBpc){ \ ASSIGN_VSCALEX_FUNC(c->yuv2planeX, sse4, if (!isBE(c->dstFormat)) c->yuv2planeX = ff_yuv2planeX_16_sse4, HAVE_ALIGNED_STACK || ARCH_X86_64); - if (c->dstBpc == 16 && !isBE(c->dstFormat)) + if (c->dstBpc == 16 && !isBE(c->dstFormat) && !(c->flags & SWS_ACCURATE_RND)) c->yuv2plane1 = ff_yuv2plane1_16_sse4; } if (EXTERNAL_AVX(cpu_flags)) { ASSIGN_VSCALEX_FUNC(c->yuv2planeX, avx, , HAVE_ALIGNED_STACK || ARCH_X86_64); - ASSIGN_VSCALE_FUNC(c->yuv2plane1, avx); + if (!(c->flags & SWS_ACCURATE_RND)) + ASSIGN_VSCALE_FUNC(c->yuv2plane1, avx); switch (c->srcFormat) { case AV_PIX_FMT_YUYV422: diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index b643a47c30..859993db6f 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -35,40 +35,140 @@ AV_WN32(buf + j, rnd()); \ } while (0) -// This reference function is the same approximate algorithm employed by the -// SIMD functions -static void ref_function(const int16_t *filter, int filterSize, - const int16_t **src, uint8_t *dest, int dstW, - const uint8_t *dither, int offset) +static void yuv2planeX_8_ref(const int16_t *filter, int filterSize, + const int16_t **src, uint8_t *dest, int dstW, + const uint8_t *dither, int offset) { - int i, d; - d = ((filterSize - 1) * 8 + dither[0]) >> 4; - for ( i = 0; i < dstW; i++) { - int16_t val = d; + // This corresponds to the yuv2planeX_8_c function + int i; + for (i = 0; i < dstW; i++) { + int val = dither[(i + offset) & 7] << 12; int j; - union { - int val; - int16_t v[2]; - } t; - for (j = 0; j < filterSize; j++){ - t.val = (int)src[j][i + offset] * (int)filter[j]; - val += t.v[1]; + for (j = 0; j < filterSize; j++) + val += src[j][i] * filter[j]; + + dest[i]= av_clip_uint8(val >> 19); + } +} + +static int cmp_off_by_n(const uint8_t *ref, const uint8_t *test, size_t n, int accuracy) +{ + for (size_t i = 0; i < n; i++) { + if (abs(ref[i] - test[i]) > accuracy) + return 1; + } + return 0; +} + +static void print_data(uint8_t *p, size_t len, size_t offset) +{ + size_t i = 0; + for (; i < len; i++) { + if (i % 8 == 0) { + printf("0x%04zx: ", i+offset); + } + printf("0x%02x ", (uint32_t) p[i]); + if (i % 8 == 7) { + printf("\n"); } - dest[i]= av_clip_uint8(val>>3); } + if (i % 8 != 0) { + printf("\n"); + } +} + +static size_t show_differences(uint8_t *a, uint8_t *b, size_t len) +{ + for (size_t i = 0; i < len; i++) { + if (a[i] != b[i]) { + size_t offset_of_mismatch = i; + size_t offset; + if (i >= 8) i-=8; + offset = i & (~7); + printf("test a:\n"); + print_data(&a[offset], 32, offset); + printf("\ntest b:\n"); + print_data(&b[offset], 32, offset); + printf("\n"); + return offset_of_mismatch; + } + } + return len; } -static void check_yuv2yuvX(void) +static void check_yuv2yuv1(int accurate) +{ + struct SwsContext *ctx; + int osi, isi; + int dstW, offset; + size_t fail_offset; + const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); + #define LARGEST_INPUT_SIZE 512 + + const int offsets[] = {0, 3, 8, 11, 16, 19}; + const int OFFSET_SIZES = sizeof(offsets)/sizeof(offsets[0]); + const char *accurate_str = (accurate) ? "accurate" : "approximate"; + + declare_func_emms(AV_CPU_FLAG_MMX, void, + const int16_t *src, uint8_t *dest, + int dstW, const uint8_t *dither, int offset); + + LOCAL_ALIGNED_8(int16_t, src_pixels, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dst1, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_8(uint8_t, dither, [8]); + + randomize_buffers((uint8_t*)dither, 8); + randomize_buffers((uint8_t*)src_pixels, LARGEST_INPUT_SIZE * sizeof(int16_t)); + ctx = sws_alloc_context(); + if (accurate) + ctx->flags |= SWS_ACCURATE_RND; + if (sws_init_context(ctx, NULL, NULL) < 0) + fail(); + + ff_sws_init_scale(ctx); + for (isi = 0; isi < INPUT_SIZES; ++isi) { + dstW = input_sizes[isi]; + for (osi = 0; osi < OFFSET_SIZES; osi++) { + offset = offsets[osi]; + if (check_func(ctx->yuv2plane1, "yuv2yuv1_%d_%d_%s", offset, dstW, accurate_str)){ + memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); + + call_ref(src_pixels, dst0, dstW, dither, offset); + call_new(src_pixels, dst1, dstW, dither, offset); + if (cmp_off_by_n(dst0, dst1, dstW * sizeof(dst0[0]), accurate ? 0 : 2)) { + fail(); + printf("failed: yuv2yuv1_%d_%di_%s\n", offset, dstW, accurate_str); + fail_offset = show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); + printf("failing values: src: 0x%04x dither: 0x%02x dst-c: %02x dst-asm: %02x\n", + (int) src_pixels[fail_offset], + (int) dither[(fail_offset + fail_offset) & 7], + (int) dst0[fail_offset], + (int) dst1[fail_offset]); + } + if(dstW == LARGEST_INPUT_SIZE) + bench_new(src_pixels, dst1, dstW, dither, offset); + } + } + } + sws_freeContext(ctx); +} + +static void check_yuv2yuvX(int accurate) { struct SwsContext *ctx; int fsi, osi, isi, i, j; int dstW; #define LARGEST_FILTER 16 -#define FILTER_SIZES 4 - static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16}; + // ff_yuv2planeX_8_sse2 can't handle odd filter sizes + const int filter_sizes[] = {2, 4, 8, 16}; + const int FILTER_SIZES = sizeof(filter_sizes)/sizeof(filter_sizes[0]); #define LARGEST_INPUT_SIZE 512 -#define INPUT_SIZES 6 - static const int input_sizes[INPUT_SIZES] = {8, 24, 128, 144, 256, 512}; + static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + const int INPUT_SIZES = sizeof(input_sizes)/sizeof(input_sizes[0]); + const char *accurate_str = (accurate) ? "accurate" : "approximate"; declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, @@ -89,6 +189,8 @@ static void check_yuv2yuvX(void) randomize_buffers((uint8_t*)src_pixels, LARGEST_FILTER * LARGEST_INPUT_SIZE * sizeof(int16_t)); randomize_buffers((uint8_t*)filter_coeff, LARGEST_FILTER * sizeof(int16_t)); ctx = sws_alloc_context(); + if (accurate) + ctx->flags |= SWS_ACCURATE_RND; if (sws_init_context(ctx, NULL, NULL) < 0) fail(); @@ -96,33 +198,37 @@ static void check_yuv2yuvX(void) for(isi = 0; isi < INPUT_SIZES; ++isi){ dstW = input_sizes[isi]; for(osi = 0; osi < 64; osi += 16){ - for(fsi = 0; fsi < FILTER_SIZES; ++fsi){ + if (dstW <= osi) + continue; + for (fsi = 0; fsi < FILTER_SIZES; ++fsi) { src = av_malloc(sizeof(int16_t*) * filter_sizes[fsi]); vFilterData = av_malloc((filter_sizes[fsi] + 2) * sizeof(union VFilterData)); memset(vFilterData, 0, (filter_sizes[fsi] + 2) * sizeof(union VFilterData)); - for(i = 0; i < filter_sizes[fsi]; ++i){ + for (i = 0; i < filter_sizes[fsi]; ++i) { src[i] = &src_pixels[i * LARGEST_INPUT_SIZE]; - vFilterData[i].src = src[i]; + vFilterData[i].src = src[i] - osi; for(j = 0; j < 4; ++j) vFilterData[i].coeff[j + 4] = filter_coeff[i]; } - if (check_func(ctx->yuv2planeX, "yuv2yuvX_%d_%d_%d", filter_sizes[fsi], osi, dstW)){ + if (check_func(ctx->yuv2planeX, "yuv2yuvX_%d_%d_%d_%s", filter_sizes[fsi], osi, dstW, accurate_str)){ + // use vFilterData for the mmx function + const int16_t *filter = ctx->use_mmx_vfilter ? (const int16_t*)vFilterData : &filter_coeff[0]; memset(dst0, 0, LARGEST_INPUT_SIZE * sizeof(dst0[0])); memset(dst1, 0, LARGEST_INPUT_SIZE * sizeof(dst1[0])); - // The reference function is not the scalar function selected when mmx - // is deactivated as the SIMD functions do not give the same result as - // the scalar ones due to rounding. The SIMD functions are activated by - // the flag SWS_ACCURATE_RND - ref_function(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); - // There's no point in calling new for the reference function - if(ctx->use_mmx_vfilter){ - call_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); - if (memcmp(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]))) - fail(); - if(dstW == LARGEST_INPUT_SIZE) - bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + // We can't use call_ref here, because we don't know if use_mmx_vfilter was set for that + // function or not, so we can't pass it the parameters correctly. + yuv2planeX_8_ref(&filter_coeff[0], filter_sizes[fsi], src, dst0, dstW - osi, dither, osi); + + call_new(filter, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + if (cmp_off_by_n(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0]), accurate ? 0 : 2)) { + fail(); + printf("failed: yuv2yuvX_%d_%d_%d_%s\n", filter_sizes[fsi], osi, dstW, accurate_str); + show_differences(dst0, dst1, LARGEST_INPUT_SIZE * sizeof(dst0[0])); } + if(dstW == LARGEST_INPUT_SIZE) + bench_new((const int16_t*)vFilterData, filter_sizes[fsi], src, dst1, dstW - osi, dither, osi); + } av_freep(&src); av_freep(&vFilterData); @@ -245,6 +351,10 @@ void checkasm_check_sw_scale(void) { check_hscale(); report("hscale"); - check_yuv2yuvX(); + check_yuv2yuv1(0); + check_yuv2yuv1(1); + report("yuv2yuv1"); + check_yuv2yuvX(0); + check_yuv2yuvX(1); report("yuv2yuvX"); }